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Introduction 


Introduction 


Curiosity is one of the most striking features of human behaviour. People are 
constantly meddling with things, taking them to pieces, trying to find out how they 
work, and seeing what happens when they are altered. This curiosity has proved 
to be immensely fruitful, for on it depends much of present-day scientific and 
technological knowledge. Human beings experiment: they actively interfere with 
their surrounding world, and as a result they learn about its properties and learn 
to manipulate it to their own ends. It is the theme of experimentation that forms 
the subject matter of this unit. 


The focus in this unit is on one particular aspect of medicine: drug-testing. This 
is one of the most important areas in which scientific experimentation has been 
applied. Experiments were discussed quite generally in Unit 10 and, as in that 
unit, in parts we will concentrate on collecting data (stage 2 of the statistical 
modelling diagram). However, we shall also use techniques and concepts from 
earlier units to analyse the data, and we will see that the method of analysis to 
be used must be taken into account when an experiment is being planned. In 
drug-testing, and in the whole of scientific experimentation, statistics play a very 
important role. It is this role that is described in this unit. Also, since an 
experiment is performed to answer specific questions about the real world, its 
results must be interpreted in terms of what they say about that world. So this 
unit also provides further examples of how the statistical ideas you have learnt in 
M140 fit into the process of making decisions in the real world. 


Hundreds of new drugs are put on the market every year. Their manufacturers 
usually claim that the new product is better than existing ones, either because it 
is more effective or because it has fewer side effects. But how does a 
manufacturer discover what the effects of a new drug really are? With the health 
of thousands or millions of potential consumers at stake, there is certainly no 
room for serious mistakes. This means that each new drug has to be thoroughly 
tested, both to ensure that it does have the desired effect, and also to ensure that 
it does not have undesired side effects. Drug manufacturers therefore have to 
carry out numerous experiments in which they test the effects of new drugs, 
usually first on animals, or on tissues taken from animals, and then on human 
beings. After each stage in the drug-testing procedure, they have to decide 
whether to continue with the next stage, or whether to reject the new drug. The 
nature of their decision will be determined in part by the statistical information 
that they receive about the effects of the drug, as revealed by the tests they have 
conducted. Examples of questions that a drug manufacturer may ask are as 
follows. 


What is the average effect of the new drug on, say, body weight? 
How variable is this effect? 


How does the effect compare with that of other well-known and well-tested 
drugs? 


Of course, many other factors will also be taken into account in deciding whether 
to continue with further stages of testing: for example, the cost involved, the 
seriousness and prevalence of the medical conditions for which the drug might 
be used, and the availability of existing alternatives. 


Statistical information has a role in making practical decisions in various aspects 
of everyday life. Almost always, as in drug-testing, statistical information is just 
one among many factors that have to be taken into account when coming to a 
decision. You will see how the relative importance of statistical and other sources 
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of information should be assessed in decision-making. This unit is concerned 
with the practical side of statistics and decision-making, but in a rather narrow 
fashion. If a drug company (and the public) is to have any confidence in the data 
that emerges from the tests it has carried out on a new drug, then those tests 
must be carried out according to certain rules. If these rules are not observed, 
then the results of the tests could easily be worthless. We shall describe these 
rules in some detail. 


Much of this unit is devoted to a detailed discussion of the principles of 
drug-testing but, before this, in Section 1 we outline the various stages of testing 
a new drug and getting it licensed for use. This first section provides only 
background information, so you do not have to remember its details, nor will it be 
assessed, but you may wish to refer back to it when studying the rest of the unit. 
Section 2 describes the principles and methods used in Clinical trials, which are 
one of the important methods of testing drugs on human patients. Then 

Section 3 looks in detail at the design of some clinical trials. This leads naturally 
to the analysis of the data obtained from such trials, and Section 4 shows the 
close relationship between this analysis and the design of the trial. We then 
consider, in Section 5, some of the limitations of clinical trials and the further 
work that needs to be undertaken to ensure that the drugs which are marketed 
are as beneficial as possible to society. In Section 6, the whole process of testing 
and launching a new drug is illustrated by a case study. Finally, Section 7 directs 
you to the Computer Book. 


This unit, unlike the others in M140, contains many references to its sources of 
information in ‘Harvard style’. This means that the full reference is in a separate 
References section towards the end of the unit, with only a brief mention such as 
‘(Christiansen, 2006)’ or ‘SMC (2008)’ in the main unit text. You are not expected 
to follow up the detailed reference, but the information is there if you wish to. 


1 Drug research and testing 


Section 1 is for background information only and will not be assessed. 


A new drug goes through a long series of tests on animals and human beings 
before it can be put on the market. The first stage is screening. Large numbers 
of chemical compounds are synthesised in the laboratories of drug companies, 
and it is necessary to identify those which have properties that are likely to be of 
interest. 


Tests are devised which, it is hoped, will predict whether each compound has the 
desired features. The tests may use bacteria grown in cultures in the laboratory, 
cells or other material taken from animals or humans, or they may be tests on 
animals. In view of the large numbers of compounds involved, these tests must 
be relatively quick and easy to perform. Many compounds are rejected at this 
stage, but the performance of some indicates that they merit further investigation. 


Screening is followed by a much fuller series of tests on those compounds that 
seem to be of interest. Many aspects of the action of the drug will be looked at. 
Most drugs have both desirable and undesirable (or toxic) effects. Tests at this 
stage will indicate both the size of dose of the drug needed to produce the 
desirable effects and the size of dose that produces the toxic effects. There will 
also be tests of the possibility that the compound may cause cancer or birth 
defects. 


1.1 The phases of drug-testing 


If the drug seems to have desirable effects, if toxic actions are only found at 
doses much higher than those required to produce these desirable effects, and if 
there is no tendency to cause cancer or defective offspring, then tests in human 
beings may be started. Conventionally, the tests on human beings are divided 
into four phases. 


Phase 1: Early clinical pharmacology 


The first people to take the drug will normally be healthy volunteers rather than 
patients, and be closely monitored 24 hours a day within a clinic. The drug will 
initially be given in single doses that are smaller than those expected to be 
effective. The biological action and safety of the drug will be evaluated, before 
the dose is gradually increased. In some of these studies, the amount of the drug 
in the bloodstream or urine will be measured, so that the rate of absorption of the 
drug, and its rate of elimination from the body, can be assessed. Later on, 
studies will be carried out in which the volunteers take the drug repeatedly over a 
period of time. 


Phase 2: Early clinical investigations 


These are usually the first studies involving patients with the condition the drug is 
intended to treat. The aims here are as follows: to get an idea of the best dose to 
use; and to study the efficacy of the drug — that is, its ability to have the effects it 
is designed to produce. It may be that particular symptoms of the disease 
respond well to the drug, or that a certain type of patient responds better than 
others. Thus this phase is largely concerned with forming hypotheses about the 
action of the drug. 


Phase 3: Comparative studies 


In this phase the hypotheses formed in phase 2 are tested. To do this, 
comparative studies, called clinical trials, are needed. Treatment with the new 
drug is compared with existing therapy or, sometimes, with no treatment. A 
series of different dosages and dosage schedules (i.e. how often the drug is 
taken) are compared. A wider variety of patients will also be treated, including, 
for example, elderly patients and those suffering from more than one disease. 
This phase completes the work necessary to register the drug (i.e. to obtain 
permission to market it). 


Phase 4: Post-marketing studies 


This phase includes studies to provide further evidence about the safety of the 
drug. These use a larger sample of patients than can be obtained before 
marketing. It also includes various market research studies. 


Although both scientific experiments and statistical analysis are important during 
each of these four phases, they are particularly well illustrated in clinical trials 
carried out during phase 3 (and sometimes phase 2). This unit therefore 
concentrates on these clinical trials. 


1 Drug research and testing 
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1.2 The licensing of a new drug 


All medicines, whether available only on prescription or freely over-the-counter, 
must be licensed. A Europe-wide licence may be granted by the European 
Commission after evaluation, by the Committee for Medicinal Products for 
Human Use (CHMP) or another specialist committee within the European 
Medicines Agency (EMA). The CHMP is composed of one member from each 
EU country, all of whom are specialists in the field (including physicians, 
pharmacists, pharmacologists and toxicologists). The CHMP advises on whether 
a new drug should be licensed, basing its decisions on scientific criteria that 
determine whether new medicines meet safety, efficacy and quality requirements 
evidenced by clinical trial results. For each drug that is granted authorisation for 
a licence, the CHMP publishes a European public assessment report. This 
provides details of the assessment process and states the grounds on which the 
committee recommended authorisation. It also gives a summary of the product 
characteristics, the product's labelling, and the patient information leaflet. 


The EMA is concerned not only with applications to market new drugs and to use 
old drugs for new purposes. It also has an important role in the monitoring of 
drugs that are already on the market. From time to time a doctor will come 
across a complaint, an illness or perhaps a death which he suspects may be 
caused by a drug. There is a Europe-wide database, EudraVigilance, where 
reports of adverse drug reactions are held. These reports are carefully 
monitored, and where necessary, the EMA advises the European Commission to 
change a medicine's licence. These methods of gathering information about the 
unwanted effects of drugs are discussed in more detail in Section 5. 


Once a drug has been launched, doctors are allowed to prescribe it to patients 
who meet the criteria of the licence. However, for prescribing in Britain under the 
National Health Service (NHS), a treatment for a specific condition must be 
approved by the National Institute for Health and Care Excellence (NICE) in 
England, Wales and Northern Ireland or the Scottish Medicines Consortium 
(SMC) in Scotland before doctors are freely allowed to prescribe it. These bodies 
evaluate the cost-effectiveness and efficacy of treatments to provide approval 
and guidance of their use under the NHS. 


Example 1 NICE and sorafenib 


The drug sorafenib (marketed as Nexavar), used in the treatment of liver and 
kidney cancers, was licensed by the European Commission in 2006. The licence 
was approved on evidence from two phase 3 clinical trials (EMA, 2007a and 
2007b) comparing sorafenib with treatment containing no medication. The first 
trial involved 602 liver cancer patients. The second trial involved 903 kidney 
cancer patients in whom previous cancer treatment had stopped working. 
Sorafenib was shown to increase the length of time patients survived in both 
trials by an average of 2.8 and 3.4 months respectively. The drug was found to 
have several unwanted side effects, but the benefit to patient survival was 
considered to outweigh the risks. 


Sorafenib is a very expensive treatment. The cost of treatment is not a 
consideration in the drug licensing approval process. However, when the drug 
was evaluated by NICE (2009, 2010) and SMC (2008), it was not found to be 
cost-effective, so sorafenib was not approved for prescribing under the NHS. 
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2 Clinical trials 


In this section, we shall discuss in some detail the experiments that need to be 
conducted to test drugs. As already mentioned, such experiments are called 
clinical trials. First we shall describe one of the first-ever clinical trials, carried out 
in 1747 by a physician serving in the Royal Navy, James Lind (1716-1794). 


Example 2. James Lind’s scurvy experiment 


James Lind (see Lind, 1753) carried out an experiment to test various treatments 
for scurvy, a disease now known to result from vitamin C deficiency. Scurvy 
patients develop spots, spongy gums, weak knees and generally feel unwell. As 
scurvy advances, patients may develop open wounds with pus, jaundice, fever 
and loss of teeth, and eventually die. Scurvy was once common among sailors 
on board ships away at sea for periods longer than it was possible to store most 
fresh fruit and vegetables, and it resulted in many deaths. 


On 20 May 1747, James Lind was the surgeon on board the HMS Salisbury 
patrolling the Bay of Biscay. At the time, the cause of scurvy was unknown, but 
Lind had reviewed the available literature on scurvy to learn as much as he 
could. Lind took 12 patients with scurvy, as similar as possible, all with ‘putrid 
gums, the spots and lassitude and weakness of their knees’. They were fed the 
same general diet, but this was supplemented with an additional treatment: 


e two were given a quart of cider daily 


e two were given 25 drops of elixir of vitriol (sulfuric acid) three times per day 
e two were given two spoonfuls of vinegar three times per day James Lind (1716-1794) 
e two were given half a pint of sea-water daily 

e two were given two oranges and one lemon 

e two were given bigness of a nutmeg (a spicy paste) three times per day. 


Lind observed ‘most sudden and visible good effects’ on the two patients given 
oranges and lemons; both were fit for duty at the end of six days. 


2.1 Drug-testing by experiment 


At first sight, nothing would seem simpler than testing the effects of a drug. 
Simply give a person suffering from the relevant disease a dose of the drug, and 
see what happens. There are many reasons — some obvious, others quite subtle 
— why such a simple procedure might not work. To discover some of these 
reasons, the following activities consider the use of aspirin to treat headaches. 


Activity 1 Was it the aspirins? 


Suppose that you have a headache, and take two aspirins. An hour later the 
headache has gone. What do you conclude about the effectiveness of aspirin as 
a pain-killer? 


Activity 2. Any better? 


Suppose that you took a group of 20 people, all with headaches, and gave them 
each two aspirin. An hour later, 16 of them said they had no headache. What do 
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‘t couldn't afford a control 
group so | decided to go with 
an out-of-control group.’ 


you now say about the effectiveness of aspirin as a pain-killer? 


Even without treatment, headaches will often go away after a while. To gain a 
better idea of whether aspirin helps headaches go away in the above type of test, 
you need another group of people with headaches who do not take aspirin. You 
can then compare the recovery rate of those who take aspirin with the recovery 
rate of those who do not. 


Such a comparison, between a group that does undergo some form of treatment 
and a group that does not, is fundamental to drug-testing. It is also fundamental 
to many other kinds of scientific investigation, as you have already seen in 

Unit 10. Such a comparison is an experiment. In such an experiment, the group 
that receives the experimental treatment is referred to as the experimental 
group, and the group that does not receive the experimental treatment is 
referred to as the control group. 


In an ideal experiment, the experimental group and the control group resemble 
each other in every respect except for the one being tested. If this ideal were 
achieved for the aspirin experiment, and you found that all of the experimental 
group (the aspirin-tested people) recovered from their headaches within an hour, 
whereas none of the control group did, then you could be virtually certain 
(barring an extraordinary fluke) that the aspirin had really cured the headaches. 


However, it is by no means an easy task to make sure that the experimental and 
control groups resemble each other in every respect apart from the one being 
tested. 


Activity 3. Using a control group 


Consider an experimental group of headache sufferers who have been given 
aspirin, and a control group who have not. How do the two groups differ other 
than in the presence or absence of the drug in the body? 


Treating a patient can quite commonly appear to have a therapeutic effect 
(or can actually have a therapeutic effect), even when the treatment 
contains no medication and should be ineffectual. This is called the 
placebo effect. 


People have all sorts of expectations about medicines that they receive from the 
doctor, as well as about other types of treatment. These are derived from their 
own past experiences, from the attitudes of their friends and family, and also from 
the attitude and expectations of the doctor. These attitudes can have an 
important influence on the outcome of treatment. This is by no means confined 
to neurotic or suggestible people, or to hypochondriacs. Placebo effects, as you 
might expect, tend to be most noticeable in mild conditions, but they cannot be 
neglected, even in quite severe conditions. Diabetes and asthma, for example, 
have both been shown to be sensitive to such effects. 


Activity 4 Controlling for the placebo effect 


How can the placebo effect be overcome in the experiment on aspirin in 
Activity 3? 


A dummy treatment that superficially resembles the treatment being tested 
but contains no active ingredient is called a placebo. 


A Clinical trial (i.e. experiment) in which the control group takes a placebo is 
called a placebo-controlled trial. There is one ethical problem with 
placebo-controlled trials: patients go to their doctors to get treatment, not 
dummies. If you assume that the patients agree to take part in a drug-testing 
experiment, you may be able to think of circumstances where there would be no 
ethical problem in using a placebo as a control. Three such circumstances spring 
to mind: 


e Where no effective treatment exists for the disease in question. 
e Where the condition is mild. 


e Where the new treatment is an addition to an existing treatment so that the 
comparison is between old treatment plus placebo and old treatment plus new 
treatment. 


Placebo-controlled trial 


In a placebo-controlled trial, there is a treatment group and a control group. 
People in the treatment group receive the treatment being tested, while 
those in the control group are given a placebo. 


In the case of a serious, or fairly serious illness, where an effective treatment 
already exists, there would obviously be no justification for leaving a person 
untreated. What should be done in such a case? The answer is that the existing 
treatment rather than a placebo should be given to the control group. 


Such questions are, of course, ethical and they can be answered only by taking 
up a particular ethical position. The position implied in the answers given above 
is that it is necessary to do the following: 


e Minimise the possibilities of causing harm and lack of benefit to a person from 
a Clinical trial. 


e Maximise the knowledge gained about the effectiveness and risks of the drugs. 


e Make sure that the person being treated knows what is happening and agrees 
to take part in the trial (this is called informed consent). 


Some examples will illustrate these issues more clearly. 


Example 3. Smoking cessation therapies 


There have been many trials of smoking cessation therapies (treatments to help 
people stop smoking) that are placebo-controlled. Many smokers quit without 
treatment, and some study participants may be more likely to quit because they 
expect the new therapy to help. It has been argued that placebo-controlled trials 
are unethical because quitting smoking is an important change that improves 
health, and several well-accepted therapies have been shown to increase 
cessation rates. Others argue that most people in the control group would not 
take any therapy to help them stop smoking if they were not in the study, and a 
placebo effect could help them quit smoking. 


a ; 
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Example 4 Common cold 


The common cold is incurable in the sense that no known drug can kill the 
viruses that cause it. On the other hand, drugs are available that can alleviate 
the symptoms (such as sneezing, headache, watering eyes, blocked nose). If a 
drug company develops a new drug that they think will kill the virus, then they 
would want to test it on an experimental group of people with colds, and to use 
as a control group people who had colds but were not taking any existing drugs 
that relieve the symptoms. In this way they could clearly distinguish the effects of 
the new drug and would not feel too unhappy about asking the control group to 
suffer their colds without treatment. 


Activity 5 Placebo for scurvy 


Would it have been ethical for James Lind to include a placebo group in his 
experiment on scurvy patients? 


2.2 Measurement 


You may remember that in Subsection 2.1, to assess the effectiveness of aspirin, 
we suggested that people should be asked whether or not they still had a 
headache one hour after taking two aspirin or two placebos. 


Activity 6 A good question? 


Suppose you are designing a placebo-controlled trial to test aspirin’s 
effectiveness at reducing headaches. For comparing the treatments, what 
problems might there be in asking each subject if the headache is still present 
one hour after taking the tablets? 
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‘This doesw't Look good. I'm afraid you've 
developed an immunity to placebos.’ 


As noted in the solution to Activity 6, there are a number of ways that aspirin and 
a placebo might differ, both with regard to changes in the severity of the 
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headache and the time the headache lasts. Some of these differences would be 
missed by asking a single yes/no question of the form ‘Did you still have a 
headache an hour after taking the tablets?’. 


To get round this problem, you might ask the people to rate their headache on a 
scale such as this: 


e O0—None. No headache. 

e 1—Mild. The headache is there but it does not bother me much. 

e 2—Moderate. The headache is definitely a nuisance. 

e 3-— Severe. The headache is so bad that | can hardly think of anything else. 


You could ask them to rate their headaches on this scale before taking aspirin, 
and then every 15 minutes over a period of, say, 4 hours. 


What is the purpose of taking a rating before treatment? This is done because 
the aspirin might improve a headache without completely curing it, and this 
would still be better than nothing. It is possible to judge improvement only if you 
know how bad the headache is to start with. 


Activity 7 Is a subjective scale OK? 


Judging the severity of a headache is not easy. The measure is, by its nature, 
subjective. People do not find it easy to assess their own discomfort very 
accurately, or reliably. Does this matter? 


Although many symptoms, like pain, must be assessed subjectively, others are 
amenable to quantitative and, relatively speaking, more objective measurement. 
Your doctor can, for example, measure your blood pressure or your pulse-rate 
and express these measurements in widely understood quantitative terms (units 
of pressure or beats per minute). If you receive some form of treatment that 
alters your blood pressure or pulse-rate, then these changes can also be 
expressed in quantitative terms. Measurements such as these are objective 
measurements, in contrast to the subjective measurements mentioned earlier. 


‘Objective’ is, of course, a relative term. Different doctors using exactly the same 
equipment on the same patient might well obtain slightly different values of the 
quantity they are measuring, simply because they use the equipment in slightly 
different ways. Different doctors using different pieces of equipment would be still 
more variable in the measurements they produced. Although objective 
measurements are more cut-and-dried than subjective measurements, this does 
not mean that they possess absolute accuracy, nor that they are necessarily, 
under all circumstances, of greater value to the doctor than subjective 
measurements. Nor does it mean that objective measurements are free from 
placebo effects. Objective measurements can be just as susceptible to placebo 
effects as subjective ones. You are probably aware that if you are anxious your 
heart rate (the number of beats per minute of the heart) is likely to go up, as is 
your blood pressure. The outcome of objective measurements can be influenced 
by the expectations and attitudes of the patient to treatment almost as much as 
can subjective measurements. Therefore, whether the effectiveness of the 
treatment is measured objectively or subjectively, it is important that the clinical 
trial be properly designed using a control group. 
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Activity 8 Beta-blocker and placebo 


Suppose that a research laboratory is setting up a clinical trial of the effect of a 
beta-blocker on blood pressure. One group of people takes the beta-blocker 
tablet, whilst the other group takes a dummy tablet (a placebo). Does it matter if 
the people know which kind of tablet they are taking? 


A Clinical trial in which individuals do not know whether they are taking the drug 
or the placebo, is called a blind trial. 


Activity 9 Should the doctor know who gets the placebo? 


For the clinical trial in Activity 8, what about the doctors who are giving the 
treatment to the patient: does it matter if they know which people receive which 
treatment? 


If the doctors in a research laboratory are comparing two treatments, then they 
may also have expectations. They may be actively involved in the research 
project and be enthusiastic about one of the treatments. Such enthusiasm can 
be conveyed to the patients and change their expectations. Or a doctor might be 
sceptical about one of the treatments. This too can be conveyed to the patients 
and affect the outcome of the trial. (We shall use trial as short for clinical trial 
throughout this unit now.) 


As well as such direct effects on the patients, there are effects caused by the 
doctors’ involvement in assessing the outcome, i.e. their attitudes may affect their 
assessment of the outcome. They may, for example, tend to assess the patients 
on a new drug as doing better than those on a placebo; or, because they are 
aware of this possibility, they may bend over backwards to be fair, 
over-compensate, and so assess the patients on a placebo as better. Each of 
these effects will tend to shift the results in a particular direction (but they may 
partially cancel each other out). Such effects are examples of bias. 


Activity 10 Bias in James Lind’s scurvy experiment 


When James Lind conducted his experiment on scurvy patients, his two worst 
patients ‘with tendons in the ham rigid (a symptom none of the rest had)’ were 
given a half pint of sea-water daily as their treatment. There were only two 
patients assigned to each of six treatments. How might this have biased his 
results on the sea-water treatment in a particular direction? 


The doctors who give the treatment should therefore also be ignorant of the 
nature of the treatment being given. (This may be difficult to achieve in practice, 
especially if the patient’s reactions to the drug are obvious to the doctor.) 


A trial where neither patients nor doctors know which treatment is 
administered, is called a double-blind trial. A study in which the patient is 
blind but the doctor is not, or vice versa, is sometimes called a single-blind 
trial. 


As far as is possible, all controlled clinical trials are double-blind trials. You may 
wonder how anybody knows which kind of tablet a patient is receiving in a 
double-blind trial. Usually a third party, such as a pharmacist who is independent 
of the doctors and patients, keeps a record of which tablets have been 
administered by which doctors to which patients. 


\ 


Her attempt to stay blinded 
was ruined by the sign-building 
side-effect of the treatment. 


As we have already said, James Lind’s 1747 experiment is credited as 
being the first-ever clinical trial. According to Bhatt (2010), the first 
double-blind clinical trial was not carried out until 1943. The trial 
investigated a treatment for the common cold. 


Activity 11 Blinding in James Lind’s scurvy experiment 


In James Lind’s 1747 experiment on scurvy, all patients were kept in the same 
space. James Lind hand picked his 12 patients, assigned their treatments and 
assessed their outcome. Though conditions on board a ship would have been 
difficult, how might it have been possible to achieve blinding in this experiment? 


All of the problems discussed so far in connection with the aspirin experiment 
relate to the question: 


How can you make a correct assessment of the effect of a drug on an 
individual? 


This may seem at first to have little to do with statistics, but, as we shall shortly 
explain, statistics play an important part in helping doctors to analyse the results 
of clinical trials. Moreover, the kind of statistical analysis that is best suited to a 
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particular experiment depends very much on the kind of control group that is 
used, on whether the doctor has taken objective or subjective measurements, 
and on other similar factors. 


Such factors (the nature of the control group, the kinds of measurement, etc.) are 
known collectively as the design of the experiment. One of the main purposes 
of this unit is to show that the design of an experiment and its statistical analysis 
go hand-in-hand. Before coming to this, however, one other major feature of 
drug-testing experiments has to be discussed: the variability of the people to 
whom the tests are administered. 


You have now covered the material related to Screencast 1 for Unit 11 (see 
the M140 website). 


2.3. Variability 


Clinical trials need both experimental and control groups of people. But how 
many people are needed in each group to assess the effect of a drug? 


Activity 12 An unsatisfactory experiment 


Suppose that two people suffering from headaches were available and you gave 
one of them aspirin and the other placebos. Why would this be an unsatisfactory 
experiment? 


Three sources of variability are readily distinguished. 
e Variability in the disease itself. 


e Variability in the response: even when patients are suffering from a similar 
severity of disease, their response to the drug may vary. 


e Inaccuracy in the method of measurement: as we saw in Subsection 2.2, 
subjective measurements of sensations like pain are necessarily imprecise, 
and even objective measurements are limited in their accuracy by 
imperfections in the measuring instruments and in the people who operate 
them. 


The relative importance of these factors will differ from trial to trial, but the total 
variability will very often be substantial compared to the effect of the drug itself. 
To illustrate this problem, we look at an example of blood glucose measurements. 


Example 5 Blood glucose 


Blood glucose levels were measured throughout the day in 24 healthy volunteers 
under controlled conditions in a clinic. All volunteers ate the same breakfast at 
7:30, the same lunch at 12:30 and their choice of dinner at 18:00. In this study 
(Christiansen, 2006), all subjects were monitored by two devices and an average 
measurement was taken. Figure 1 shows blood glucose measurements from the 
two separate devices (SCGM 1 and SCGM 2) for one volunteer. 
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Figure 1 Measurements of blood glucose on two devices 


Figure 1 shows variation in measurements between the two different devices, 
and illustrates that blood glucose peaks after meals. For instance, the blood 
glucose value after lunch at 12:30 is approximately double its value just before 
lunch. 


Activity 13. Measuring blood glucose 


Given that the blood glucose level can vary so widely within a single day, what 
precautions need to be taken when testing the effect of a drug on the blood 
glucose level? 


Where possible in a study, measurements should be taken with all study 
participants in a similar state, using the same instruments in the same lab. In 
Figure 2, blood glucose levels for 21 of the volunteers in the study in Example 5 
are plotted over a 24-hour period. 


Glucose concentration [mg/dl] 
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Figure 2 Blood glucose levels of 21 healthy volunteers 


It is clear that there is enormous variation in blood glucose levels between these 
21 healthy volunteers at the beginning of the day before any food is eaten, and 
also resulting from the effect of eating food. Thus there is wide variation in 
‘normal’ blood glucose, so in studies of drugs to treat diabetes, researchers must 
try to decipher differences between drugs from variation between study 
participants. 
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Often there are known reasons for differences in the physiology between 
individuals: for example, height. Factors known to influence an adult’s height 
include sex, ethnicity and nationality, and health and nutrition during 
development. These factors are called sources of variation, and when 
recorded, can be used to explain variation to some extent. 


A jazz band improvising: another source of variation 


Example 6 Heart rates 


Consider a trial to assess the effect of a drug on a person’s heart rate. Like blood 
glucose, heart rates vary throughout the day. Although there are other sources of 
variation in heart rate, the main source of this variation is physical activity. 
Consequently, it is important that measurements on study participants should be 
taken at comparable levels of activity. 


Interest often focuses on maximum heart rate, even though this is often 
estimated rather than measured directly. Measuring the maximum rate can be a 
lengthy and difficult procedure and, moreover, it can be dangerous to subject 
elderly people with weak hearts to the strenuous exercise required. The 
estimates are made by measuring heart rate under mild exercise and using the 
result to predict what the maximum would be. Figure 3 shows the mean 
maximum heart rate of men and women at different ages. 
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Figure 3 Mean maximum heart rates of men and women at different ages 


Activity 14 Factors to control? 


If you wished to reduce the variability in the heart rate of study participants, 
would it be advisable to ensure that the experimental and control groups were of 
the same sex, or of the same age, or both? 


Figure 3 shows the mean maximum heart rate for men and women of various 
ages, but it does not show the spread of the maximum heart rates of people of 
the same age. Individuals differ quite widely in their maximum heart rate: 
individuals of the same age undertaking a similar activity can have somewhat 
different heart rates. (Heart rate increases with physical activity.) 


At each of the ages covered by Figure 3, the population standard deviation of 
maximum heart rate is about 10 beats per minute. This is a measure of the 
natural variation in heart rate between people. It is against the background of this 
natural variation that the effect of any drug on heart rate must be judged. 
Suppose that a new drug reduced the maximum heart rate, on average, by 

3 beats per minute. The drug would have to be tested on a very large sample of 
people before an effect as small as this could be distinguished from the natural 
variation in heart rate between people. 


Example 7 Propranolol and heart rate 


Beta-blockers are a group of drugs primarily used in the treatment of high blood 
pressure, angina pectoris and some other conditions of the heart and the blood’s 
circulatory system. Figure 4 shows how heart rate is affected by one 
beta-blocker, propranolol. 


20074 Control ¢ 


e Propranolol 
1757 


1504 
1254 4 t 


100- $ 
| f 


504 : 


Rest 25 50 75 100 
Measure of vigour 


Heart rate (beats per minute) 


Figure 4 The effect of propranolol on heart rate 


The effect varies according to the vigour of people’s activity. Each orange dot 
denotes the mean of the heart rates of four people when they were given 
propranolol. Each black dot denotes the mean of the heart rates of the same four 
people when they received no drug. The vertical line through each dot denotes 
the spread of those four measurements. 
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When the subjects (people) are resting, the effect of the drug is hardly 
detectable: the mean of the heart rates of the subjects after they have received 
the drug is slightly lower than that for the same subjects when they have not, but 
the spreads of the individual heart rates (indicated by the vertical line through 
each data point) is large enough to mask this effect. When the subjects are 
undertaking more vigorous activity, the effect of the drug becomes much more 
noticeable and, when they are engaged in extremely vigorous activity, the 
difference is very large and could easily be detected despite the spread of the 
individual heart rates also being larger than when the subjects are at rest. 


Propranolol 


Propranolol was the first successful beta-blocker developed. In 1988, Sir 
James Black (1924-2010) was awarded the Nobel Prize in Medicine for its 
discovery. Newer beta-blockers are now used to treat high blood pressure, 
but propranolol is still used to treat other conditions. 


Variability is the main reason why experimenters need to use statistics. Drug 
companies do not just want to know what happens to those patients who take a 
new drug in one particular clinical trial. They want to know the effect their drug 
would have on a much wider population: for example, the population of all people 
in the UK who might, now or in the future, suffer from a particular disease. They 
cannot test the drug on all the members of this population (some of them might 
not even have the disease yet!), so the drug is tested on a sample of people from 
the population, and the experimenters must make an inference from the results 
for this sample back to the population. 


If people did not vary at all, then the experimenters could find out all they needed 
to know by trying out their drug on just one person. They would have no need for 
statistics: a situation that, no doubt, they would greatly welcome! However, 
people vary greatly, and small effects are often important in medical contexts, so 
the need for statistics in drug-testing is paramount. 


Exercises on Section 2 


Exercise 1 Measurement scale 


Which of the following features can be measured only on a subjective scale and 


which can be measured on an objective scale? 
(a) Weight loss. 

(b) Appetite. 
(c) Sore throat. 
(d) 


d) Indigestion. 


Exercise 2 Controlling factors in a double-blind trial 


Several volunteers agree to take part in a clinical trial of a new drug which, it is 
hoped, will relieve depression. The volunteers are divided into two groups whose 
compositions in terms of age and sex are as similar as possible. Double-blind 
trials of the drug are carried out. 


(a) What sources of variability between the experimental and control groups 
might not have been controlled by this procedure? 


(b) Why might it not be justifiable to expect that the results obtained from this 
Clinical trial would apply to all people suffering from depression? 


(c) How might some of the problems that you have identified in parts (a) and (b) 
be overcome? 


Exercise 3. Sources of bias? 


In a large-scale clinical trial of a new anti-depression drug, several experimenters 
assess the drug's effect on patients by interviewing them. Each experimenter 
uses a standardised questionnaire to assess the severity of a patient’s 
symptoms. What sources of bias in assessing the effect of the drug might there 
be in such a trial? 


3 Design of clinical trials 


Now that you have been introduced to some of the fundamental problems in 
designing and carrying out clinical trials (and other experiments), it is possible to 
go further and look at certain aspects of their design in more detail. Briefly, the 
two main requirements for a well-designed clinical trial are as follows. 


e It must eliminate all known forms of bias as far as possible. 


e It must be as sensitive as possible — that is, it must, without requiring a large 
number of patients or a large amount of time, have a good chance of 
accurately detecting any difference between the treatments being tested 
despite the variability of patients. 


3.1 Crossover design 


A simple way of eliminating a great deal of the variability in a trial is to give each 
person taking part both treatments. In this way, individuals act as their own 
controls. This procedure eliminates a lot of the variability that arises when 
different individuals act as experimental and control subjects. As an example, 
suppose that doctors wish to compare the effect of a beta-blocker with that of a 
placebo on blood pressure over a short period (e.g. two months). They could 
give each patient the placebo for eight weeks and then give them the active drug 
for eight weeks, measuring each patient’s blood pressure before they started and 
during the two treatment periods. Of course, they would want to eliminate the 
possibility of a placebo effect that was greater at the beginning than at the end of 
treatment, so they would give half of the patients the placebo first, followed by the 
active treatment, whereas the other half would get the active treatment first 
followed by the placebo (see Figure 5). A similar design could be used to 
compare two different beta-blockers (see Figure 6). This kind of design is called a 
crossover trial. (Strictly speaking, it is a two-period crossover trial.) A crossover 
trial is thus one in which, during the course of the experiment, each subject 
crosses over from receiving one treatment to receiving the other, or vice versa. 
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Figure 5 Crossover trial to compare a beta-blocker with a placebo 
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Figure 6 Crossover trial to compare two beta-blockers 


Activity 15 Unusual assumption of a crossover trial 


What unusual assumption is made in the crossover trial design? Write down one 
case where this assumption is not valid. 


If doctors were comparing two antibiotics to treat a bacterial infection, then there 
would be no point in using a crossover trial because they would expect their 
patients, or at least some of them, to be cured by the treatment. Therefore the 
group of patients entering the second period, after receiving antibiotics in the 
first, will not be in the same condition as when they started the first period. 
Besides curing the patient, any effect of the drug that tends to last beyond the 
treatment (that is, any carry-over effect) will confuse the issue, so that drugs 
should usually be given for some time before measurements are made. 


Thus the crossover design is most suited to chronic conditions (conditions that 
are relatively long-lasting) such as high blood pressure, diabetes or arthritis, and 
is not well suited to conditions which are given to spontaneous ups and downs. 
Consequently the crossover trial is of rather limited application but, when it is 
suitable, in many cases it is the best design to use. 


Activity 16 Can a crossover design be used? 


State whether a crossover design would be appropriate for the following trials. 


(a) Type 1 diabetes is a chronic condition in which patients have high blood 
sugar. It cannot be cured but it can be controlled by giving insulin. However, 
insulin can cause hypoglycaemia, where blood sugar becomes too low. A 
trial is to be set up to compare two different regimens of insulin 
administration to control type 1 diabetes in patients. The outcome 
measurement is the number of hypoglycaemic episodes. 


(b) A trial on scurvy and two diet supplements, similar to James Lind’s 
experiment. Scurvy can be cured if vitamin C is included in the diet, but 
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without this, scurvy will eventually result in death. The main outcome 
measurements are whether or not a patient is cured and the time it takes to 
be cured. 


fri. 


You have now covered the material related to Screencast 2 for Unit 11 (see A 
the M140 website). =F 


3.2 Matched-pairs design 

Even if it is impossible to give both treatments to each individual, it is often 
possible to pair individuals into matched pairs by identifying particular features of 
the individuals or of the disease from which they are suffering, that are likely to 


be important to the outcome of any treatment. The doctors can then give one 
treatment to one member of the pair and the other treatment to the other 


member (see Figure 7). 
random 
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Figure 7 Matched-pairs design with a placebo 


A trial that is designed in this way is called a matched-pairs trial. Examples of 
factors that might be matched are age, sex, severity of illness, length of time the 
patient has suffered from the illness, and the presence of other illnesses. Such a 
trial can be very useful but again has serious limitations. If a doctor finds an 
elderly lady suffering from a mild illness of one year’s duration, can another be 
found with whom to match her? How long will the doctor have to wait until a 
suitable match is found? Another problem is whether there is enough information 
about which factors really do affect the outcome of a treatment. If it does not 
matter whether the patient is a man or a woman, then there is no point in 
matching patients by sex. If, on the other hand, people of different blood groups 
respond differently to the treatment, then failure to match patients by blood group 
would reduce the usefulness of the clinical trial. 


Twins make ideal matched pairs 
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A matched-pairs trial is more appropriate for fairly common, long-lasting disease 
for which special clinics exist, or lists of patients are available. Examples are 
diabetes and asthma. In this case, the patients can be selected and matched 
(i.e. paired) before the trial starts. In diseases where patients turn up and need to 
be treated more or less at once, a matched-pairs trial is harder to organise. 


Activity 17 Organising matched pairs 


Organise the following eight patients into matched pairs. Match using the 
following order of importance: sex, smoking status, age. 

Male, smoker, aged 43, 

Female, non-smoker, aged 41, 


(=) 


) 

) 

) Male, non-smoker, aged 47, 

) Female, non-smoker, aged 49, 
) 


i?) 


Male, smoker, aged 44, 
f) Male, smoker, aged 46, 
g) Male, non-smoker, aged 45, 
h) Male, smoker, aged 48. 


You have now covered the material related to Screencast 3 for Unit 11 (see 
the M140 website). 


3.3. Group-comparative design 


Very often it is not possible to use either a crossover or a matched-pairs trial: the 
medical condition involved may not be long-lasting (this rules out a crossover 
trial) and patients cannot be neatly divided into pairs according to factors thought 
to influence response to treatment (this rules out a matched-pairs trial). There is 
then little choice but simply to divide the patients into the two groups and to give 
one treatment to the patients in one group and the other treatment to those in the 
other (Figure 8). The principle behind this division into groups should be to 
ensure that each group is representative of the population being studied, and 
that the allocation of treatments to patients is decided at random. 


For example, if gender and age were thought to influence a person’s response to 
treatment, then care would be taken so that the control group and the treatment 
group each contained similar male-to-female ratios and that each group had 
similar age profiles. Within these restrictions, patients would be allocated to the 
treatment group or control group at random. This is discussed further in the next 
subsection and forms the basis of the group-comparative trial. It can be used 
in a very wide variety of situations and it is the most commonly used design of 
clinical trial. 
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Figure 8 Group comparative design with a placebo 
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You have now covered the material related to Screencast 4 for Unit 11 (see sp 
the M140 website). 


3.4 Randomisation 


Each of the three designs of trial described in Subsections 3.1 to 3.3 involves an 
allocation process in which some decision has to be taken concerning the 
patients involved in the trial. For example: 


e In a crossover trial to compare a new drug with a placebo, you need to decide, 
for each patient, whether they receive the drug first and then the placebo or 
vice versa. 


e In a matched-pairs trial to compare two drugs, after pairing the patients you 
need to decide, within each pair, which patient is given which drug. 


e In a group-comparative trial to test a new drug, you need to decide, for each 
patient, whether they should be in the experimental group or the control group. 


We shall now explain the importance of using random methods in these 
allocation processes, using a group-comparative design. Similar reasoning can 
be applied to other designs of clinical trial and similar random methods of 
allocation can be devised. The use of such random methods is called 
randomisation: it is an important part of the design of clinical trials and of other 
experiments whose results are to be analysed using statistics. 


If individuals with a particular trait are more likely to be selected for the 
experimental group, while study participants of another kind are more likely to be 
selected to receive the control, whether this selection is conscious or not, any 
comparison of the two groups may be biased. This is known as selection bias, 
and the most important reason for randomisation is to eliminate this bias. The 
process of randomisation may also facilitate blinding the identity of treatments to 
the study investigators and participants. Finally, using random methods to 
allocate treatments allows the use of probability theory to express how likely it is 
that any difference in outcome between groups occurred by chance. 


As described briefly above, designing a group-comparative trial involves finding a 
procedure for allocating patients to the two groups. Choosing representative 
groups is a similar problem to that of choosing a representative sample for a 
survey (covered in Unit 4). For a survey, the sampling method should produce a 
sample of the required size. Similarly, it would be bad in a clinical trial if a very 
large number of the patients ended up in one group and only a few in the other. 


Suppose that you wish to test a new drug against an existing drug. Sometimes, 
when the effects of the control treatment (the existing drug) are particularly well 
known, experimenters use a control group that is considerably smaller than the 
experimental group, but, on the whole, it seems reasonable to insist that the 
numbers of people in the two groups should be approximately equal. One way of 
achieving this would be to allocate alternate patients to the two groups 
(experimental and control). This method is similar to systematic random 
sampling (see Subsection 2.2 of Unit 4) and is sometimes the best method. 
However, there is a danger that the doctors observing the patients might discover 
that this method had been used; for example, if the odd-numbered patients did 
not do so well as the even-numbered patients, or if alternate patients developed 
a particular side effect. The trial would not then be a double-blind trial. 
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Activity 18 Avoiding patterns in treatment allocation 


How could the danger described above, that the doctors observing the patients 
discover the allocation to the two groups, be reduced? 


There are several ways in which a patient could be randomly allocated to the 
experimental or control group. Perhaps the simplest method is to toss a coin for 
each subject. If it comes down ‘heads’, then the patient is allocated to the 
experimental group; if ‘tails’, then the patient goes in the control group. In 
practice, a random number generator on a computer would be used in place of 
the coin. 


In Unit 4 (Subsection 1.3) you saw that random samples tend to be 
representative. In practice this means that, if the number of patients is large 
enough, then the use of random numbers to allocate them to experimental and 
control groups generally results in the two groups being fairly similar in all their 
features, including features that might not have occurred to the experimenter as 
being important. For instance, the ratio of males to females will be about the 
same in the two groups, as will the proportions of people of different ages, etc. 
However, often it makes sense to build balance into a trial design, and we will 
briefly explain why this is and how it is done. (You have already seen in 
Activity 10 (Subsection 2.2) the kind of problem that can occur if there is not 
balance.) 


If a trial is small, there is a risk that the control and treatment groups may have 
very uneven numbers. One solution to this would be to decide the number of 
participants in the study beforehand, so the number to be allocated to each 
group is known and these can be randomly ordered. This would be similar to 
putting the appropriate number of ‘A’s and ‘B’s into a bag and withdrawing the 
letters to assign the groups. 


A trial may be carried out over several different locations, such as different clinics, 
known as centres, or over several time periods, for example, spring admissions 
and winter admissions. It is preferable to balance numbers assigned to the 
treatment and control groups within each centre and/or time period. Similarly, the 
investigators may decide that it is important to balance allocation to groups for 
other important variables, such as age or disease progression. This is achieved 
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using stratified randomisation, with strata defined for centre, time period, age 
or disease progression status as necessary. (Stratified sampling was described 
in Subsection 4.2 of Unit 4.) Each stratum is treated as if it has its own separate 
mini-trial, and randomisation for each stratum is carried out independently. This 
is only achievable with sufficient numbers of trial participants within each stratum. 


The following summarises the three different designs of clinical trial that have 
been considered in this section. 


e The crossover trial. Each person acts as his or her own control: during 
the course of the trial, each person crosses over from having one 
treatment to having the other, or vice versa. 


e The matched-pairs trial. Each person in one group is matched as closely 
as possible with a person in the other group. 


e The group-comparative trial. People are allocated randomly to two 
groups, usually in such a way that the two groups contain approximately 
the same number of people. 


The following box summarises their drawbacks and advantages. 


e The crossover trial eliminates the variability that would arise from using 
different people in the experimental and control groups, but it cannot be 
used when the experimental treatment irreversibly alters a patient's 
condition, nor is it suited to short-lasting diseases. 


e The matched-pairs trial eliminates much of the variability that arises from 
using different people as experimental and control subjects, but it can be 
difficult to achieve a good match between the experimental and control 
groups. 


e The group-comparative trial does not eliminate the variability that arises 
from using different individuals in the experimental and control groups, 
but it is relatively easy to set up. 


Exercises on Section 3 


Exercise 4 What type of trial is being used? 


State which of the three designs of trial is being used in each of the following 
experiments: a crossover, a matched-pairs or a group-comparative design. 


(a) Ten pairs of identical twins are found. The experimenter allocates one of 
each pair of twins at random to the experimental group and one to the 
control group. He administers a new drug to the experimental group and a 
placebo to the control group. 


(b) Eighty people suffering from arthritis are divided randomly into two groups. 
One group receives a new drug, the other a placebo. 


(c) Another eighty people suffering from arthritis are divided randomly into two 
groups. One group receives a new drug for three months and then a 
placebo for another three months. The other group receives the placebo for 
three months and then the drug for three months. 
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Exercise 5 What type of trial should be used? 
Of the three designs of clinical trial, which would be most appropriate to use in 
each of the following tests? 


(a) To test a drug which alleviates an unpleasant and long-lasting symptom of a 
disease (e.g. pain) without curing the disease. 


(b) To test a drug which improves a condition that is rare and requires 
immediate treatment. 


(c) To test a drug which helps people to give up smoking. 


4 Analysing the data 


In this section we shall describe the type of analysis commonly used for data 
from clinical trials like those in the last section. This will include more general 
points concerning the collection and analysis of data from scientific experiments. 


4.1 What analysis? 


Imagine that a research laboratory has completed a clinical trial of a new drug. 
The researchers decided the design for their experiment, selected the patients 
and used the appropriate random allocation process. They then administered the 
appropriate treatments (one experimental and one control) to the correct patients 
at the correct time and measured the effects, i.e. they collected the data. They 
now want to investigate whether the experimental treatment really differs from 
the control treatment in its effect on patients. How should they analyse this data? 


It would be appropriate to use hypothesis testing. Units 6-10 described several 
statistical tests that can be used to investigate whether a hypothesis like this is 
tenable. 


Activity 19 The null and alternative hypotheses 


Suppose that a drug company wishes to compare the effect of a new drug on 
headaches with that of an existing drug. In the terminology of Unit 7 

(Subsection 1.3), what would be the null hypothesis, and against what alternative 
hypothesis should the null hypothesis be tested? 


Activity 20 A one- or two-sided test? 


Section 6 of Unit 10 discussed one-sided and two-sided hypothesis tests. Would 
a one-sided or a two-sided test be more appropriate to test the hypotheses given 
in Activity 19? 


You might have thought that a one-sided test would be appropriate in Activity 20, 
on the grounds that the drug company would not be interested in testing a drug 
unless they were fairly certain that it could not be less effective than an existing 
treatment. However, even if the company do think that, the results of their 
experiment have to be made available to the European Medicines Agency (EMA) 
for scrutiny, and the EMA do not wish to rule out the possibility that the new drug 
is worse than the old by using a one-sided test. 


4.2 Hypothesis testing 


You have already met several hypothesis tests, and there are many others 
available. In analysing data from experiments like those described in Section 3, 
each test does roughly the same job. The experimenter sets up a null hypothesis 
of no difference (in median or mean) between two populations. The data consist 
of measurements made on samples from each of the two populations. The 
outcome of the hypothesis test is that the null hypothesis either is or is not 
rejected, on the basis of a test statistic calculated from the observed data from 
the two samples. Section 4 of Unit 6 and Subsection 1.3 of Unit 10 describe the 
process of hypothesis testing, which is summarised in Figure 9. 


Set up 
HYPOTHESIS 


Find value of 
TEST STATISTIC 


Look up 
CRITICAL VALUE 


COMPARE 
test statistic with 
critical value 


Do not reject 
null hypothesis 


Figure 9 Steps in a hypothesis test 


To use a test, the experimenter might decide the significance level, which is the 
probability of incorrectly rejecting the null hypothesis when it is actually true. 
Alternatively, the p-value (significance probability) given by the test might be 
used to evaluate the strength of evidence against the null hypothesis. 


In the present context, if the clinical trial produces two batches of measurements, 
one on the effect of the new drug on headache pain and the other on the effect of 
the existing drug, then a hypothesis test at a given significance level will help to 
decide whether the two drugs differ in their effects when they are applied to the 
population from which the samples were chosen. 


It is important to remember that what the doctor, scientist, or anyone else 
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decides to do with the result of a hypothesis test is a matter of human judgement, 
not statistics. One factor influencing this decision should be the significance level 
of the test, so the user of the test must decide what significance level to use. In 
science and medicine there is a strong convention of using a 5% significance 
level, but there is nothing sacred about 5%. There may be occasions when it is 
sensible to use a different significance level: for example, 1% or (occasionally) 
10%. 


The following table was given in Section 5 of Unit 6 for the interpretation of 
p-values. Although the interpretation should inform subsequent decision making, 
deciding future actions is more complex than simply examining a p-value. 


Table 1 Interpretation of p-values 


p-value Rough interpretation 


p> 0.10 Little evidence against the hypothesis 

0.10 > p > 0.05 Weak evidence against the hypothesis 

0.05 > p > 0.01 Moderate evidence against the hypothesis 
0.01 > p> 0.001 Strong evidence against the hypothesis 
0.001 > p Very strong evidence against the hypothesis 


The critical region is smaller for a hypothesis test at the 1% significance level 
than at the 5% significance level — if the null hypothesis is rejected at the 1% 
level then it will also be rejected at the 5% significance level. This has the 
following obvious consequences. (See Subsection 5.2 of Unit 8, on errors.) 


e The probability of a type 1 error is less if a 1% significance level is used, than if 
a 5% significance level is used; that is, there is a smaller chance of incorrectly 
rejecting the null hypothesis when it is really true. 


e Other things being equal, the probability of a type 2 error will be greater using 
a 1% significance level than using a 5% significance level; that is, using a 1% 
significance level you are more likely to accept the null hypothesis when it is 
really false. 


Overall then, a test using a 1% significance level is more cautious about rejecting 
the null hypothesis than is the same test using a 5% significance level. When 
deciding the significance level to use in a particular trial design, doctors and 
other scientists have to take into account all the medical, ethical, social and 
financial implications of the clinical trials that they are planning and of the 
decisions that they have to make. 


Table 2 Type 1 and type 2 errors 


Ao true [o false 
Ho not rejected Correct Type 2 error 
A rejected Type 1 error Correct 


All the hypothesis tests that were introduced in Units 6—10 involved the 
assumption that the data came from samples chosen at random from the 
populations of interest. In drug-testing, although random methods are used 
(e.g. to allocate patients to the experimental and control groups), it is not 
common to use the kind of random sampling methods described in Unit 4 to 
choose the patients who are going to take part in the trial. Indeed, it is usually 
impossible to do so. 
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Example 8 Influenza 


Suppose that a drug company wants to know about the effect of a new drug on 
the symptoms of influenza in people living in the UK. The company could select 
two random samples of people from the population of the UK, giving the people 
in one sample the new drug and those in the other sample an existing treatment. 
However, unless everyone in both samples already had influenza, there would be 
nothing to measure. Since it would not be considered ethically acceptable to 
infect the people in the samples with influenza, this approach will not work. 


The experimenters must choose the groups for this trial from people who already 
have influenza, and proceed as if these people are a random sample from the 
population of people in the UK who might have caught influenza. This should 
result in sound inferences back to this population, because the groups will 
usually form samples that are representative of the population even if they are 
not strictly random samples. 


Influenza virus 


In practice, experimenters are often not very precise in specifying exactly to 
which populations their results refer. This is reflected in a common, but 
misleading, piece of jargon which you may well have heard. An experimenter 
might say casually that the experimental and control groups in his experiment 
differed significantly. By this they probably mean the following. 


‘| have carried out a hypothesis test, using the null hypothesis that the 
populations from which my two groups were chosen did not differ in their 
means, medians (or some other measure). The result of the hypothesis 
test was that the null hypothesis of no difference was rejected’ 


Thus, to say that two samples differ significantly is to infer something about the 
populations from which those samples were drawn. Another point to remember 
is that a difference which is significant in this sense may be of no practical 


ne 4. Interpret 
significance whatsoever. 


You may also see the following type of sentence in a report of a scientific 
experiment. 


‘The two groups differed significantly (p < 0.05). 


This has a similar interpretation, but gives the significance level: there is 
evidence that the groups come from populations whose means are not identical, 
on the basis of a hypothesis test using a significance level of 5%. 


Although there are many hypothesis tests available, this does not mean that 
researchers can choose any test they like, because each test requires certain 
conditions to be fulfilled before it can be used, and these conditions vary from 
one test to another. In particular, when choosing a test it is important to ask the 
following questions. 


e What type of data has the experiment produced? 


e Does the design of the experiment involve matching (i.e. pairing) the subjects 
in the two groups? 


We shall now consider both questions in the following two subsections. In 
Subsection 4.3 we will consider the types of data. Then in Subsection 4.4 we will 
consider how the design of the experiment affects the choice of test. 
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4.3. Types of data 


In earlier parts of the module you have met different kinds of data. First, there 
are categorical (or nominal) data. This form of data arises when there are 
several mutually exclusive categories and each item or person belongs in exactly 
one of the categories. Here, the data do no more than identify the category each 
item or person belongs in. For example, you could ask parents what medicine 
they used when their child last had a temperature, and record their answers as 
paracetamol, ibuprofen, none or other. There is very little quantitative information 
contained in the response of a single parent. You cannot say that a 
paracetamol-containing medicine is quantitatively different (e.g. larger or smaller) 
from an ibuprofen-containing medicine. You can, however, count the number of 
parents who fall into each category. 


Various hypothesis tests are available that can be applied to categorical data: the 
tests introduced in this module are the sign test (Sections 4 and 5 of Unit 6) and 
the y? test (Section 4 of Unit 8). The sign test could be used to examine, say, 
whether paracetamol or ibuprofen was preferred by more parents. The y? test 
could be used to examine, for example, whether the medicine preference for 
children in one group of parents differed significantly from another group of 
parents. 


Another type of data, called ordinal data, contains more quantitative information 
than categorical data. A Likert scale, introduced in Unit 4 (Subsection 3.1), is an 
example of a measure that gives ordinal data. A researcher asks patients to rate 
their sleep quality on a scale from 1 to 4, with 4 signifying that their sleep quality 
is very good, 3 fairly good, 2 fairly bad and 1 that it is very bad. Here one can say 
that a rating of 4, for example, is quantitatively different from a rating of 2. It is 
obviously higher. Data on an ordinal scale like this can therefore be ordered, but 
it is not really possible to do anything more ambitious with them than that. It 
would not really be possible to say that a person who rated their sleep as 3 had 
three times the quality of sleep than a person who rated it 1! It would not even be 
possible to say that the difference between ratings of 3 and 4 was the same as 
the difference between ratings of 1 and 2; to be precise, it does not make sense 
to subtract these ratings. 


There are specific hypothesis tests available for ordinal data, though they are not 
taught in this module. It is possible to analyse ordinal data as nominal data, 
though information about ordering is lost. 


Measurements on an interval scale (interval scale data) contain still more 
quantitative information. Data of this kind are what you might think of as real 
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measurements, since they are actual quantities in definite units, such as height in 
centimetres, age in years, heart rate in beats per minute and so on. Here it does 
make sense to say that, for instance, the difference between two people’s heights 
of 147cm and 151 cm is the same as the difference between 185 cm and 189 cm. 


Again, there are statistical techniques which are suitable for testing hypotheses 
about interval scale data. These include z-tests, described in Unit 7, and t-tests, 
described in Unit 10. 


Examples of all three types of data occur in medicine. 


Activity 21 Types of data 


For each of the following, say whether the recorded information gives categorical, 
ordinal or interval scale data. 


(a) Body temperature (in ° C). 

(b) Pain, rated on a four-point scale from mild to severe. 

(c) A woman of childbearing age’s reproductive state: pregnant or not pregnant. 
(d) Systolic blood pressure (in mmHg). 


You have now covered the material related to Screencast 5 for Unit 11 (see eee 
the M140 website). che 


4.4 Which test? 


Nominal data that can be easily categorised and placed in a contingency table 
arise frequently from clinical trials. For example, the number of test subjects 
meeting some criteria that determines whether a drug is successful can easily be 
tabulated. In Unit 8, you met contingency tables and were shown how to apply 
the ,? test. The y? test can also be applied to clinical trial data. 


Interval scale data from clinical trials are quite commonly analysed using z-tests 
or t-tests. A number of such tests have been described in Units 7 and 10: 


e The two-sample z-test and two-sample t-test, which are used to test the null 
hypothesis that the means of two populations are equal. 


e The one-sample z-test and one-sample t-test, which are used to test the null 
hypothesis that a population mean equals some specified value. 


e The matched-pairs z-test and matched-pairs t-test, which are used to test the 
null hypothesis that the mean difference between two responses is equal to 
zero. (Recall from Subsection 4.2 of Unit 10 that the matched-pairs z-test and 
matched-pairs t-test are just the corresponding one-sample tests performed 
on the differences within pairs.) 


The appropriate z- or t-test to use will depend on the number of samples, the 
sample size or sample sizes, whether or not the data are matched, and what 
assumptions are satisfied. These issues have been discussed in Units 7 and 10 
and will be returned to in Unit 12. Here we will only mention again the 
relationship between matching and clinical trial design. 


Section 3 described the advantages and difficulties of using both a crossover trial 
and a matched-pairs trial. When each subject acts as his or her own control, or 
when each can be paired with a control subject who is similar in those features 
known to be important to the experiment, then it is possible to eliminate a lot of 
unwanted variability from the experiment. Both of these trial designs produce 
data which are in the form of matched pairs. In a crossover trial, there are two 
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measurements for each person: one for each treatment. These are clearly 
paired. With a matched-pairs trial, the data from each person are paired with the 
data from the other half of the matched pair. 


The following summarises the above discussion, regarding choice of statistical 
test. 


Choice of hypothesis test, based on data and design 


Type of data Design Test 

Categorical x? 

Interval scale Group comparative two-sample z- or t-test 
Interval scale Matched pairs matched-pairs z- or t-test 
Interval scale Crossover matched-pairs z- or t-test 


Exercises on Section 4 


Exercises 6 and 7 will refresh your memory of the x? test that you met in Unit 8 
and the t-test that you met in Unit 10. They illustrate how these tests can be 
applied to data from clinical trials. For convenience, Table 28 from Unit 8 and 
Table 2 of Unit 10 are reproduced below. (Versions of these tables are also given 
in the Handbook.) These give critical values for these tests. 


Table 3 Table of critical values of y? 


Degrees of Critical values of ? 
freedom at significance level 


5% 1% 
1 3.841 6.635 
2 5.991 9.210 
3 7.815 11.345 
4 9.488 13.277 
5 11.070 15.086 
6 12.592 16.812 
7 14.067 18.475 
8 15.507 20.090 
9 16.919 21.666 
10 18.307 23.209 
11 19.675 24.725 
12 21.026 26.217 
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Table 4 5% critical values for a two-sided Student's t-test 


Degrees _ Critical value Degrees _ Critical value 


of freedom (te) of freedom (te) 
1 12.706 21 2.080 
2 4.303 22 2.074 
3 3.182 23 2.069 
4 2.776 24 2.064 
5 2.571 25 2.060 
6 2.447 26 2.056 
7 2.365 27 2.052 
8 2.306 28 2.048 
9 2.262 29 2.045 
10 2.228 30 2.042 
ih 2.201 31 2.040 
12 2.179 32 2.037 
13 2.160 33 2.035 
14 2.145 34 2.032 
15 2.131 35 2.030 
16 2.120 36 2.028 
iz 2.110 37 2.026 
18 2.101 38 2.024 
19 2.093 39 2.023 
20 2.086 40 2.021 
Exercise 6 Testing a new contraceptive pill +F3 


A drug company wished to test the efficacy of a new oral contraceptive by trying 
it out on volunteers. So 2000 volunteers were allocated randomly to two groups: 
the experimental group of 1000 women took the new contraceptive whilst the 
control group of 1000 women took an existing contraceptive. At the end of a 
one-year period, each woman taking part in the test was recorded as either 
having conceived or not. Suppose that only one of the 1000 women taking the 
new contraceptive had conceived by the end of the year, whereas 15 out of the 
1000 women taking the existing contraceptive had conceived. 


(a) What type of data is involved? 

Tabulate the data in an appropriate contingency table. 
Write down the appropriate null and alternative hypotheses. 
Carry out the test. 


What do you conclude? 


Exercise 7 A drug for altering heart rate 


Eight people took part in a crossover trial whose purpose was to discover 
whether a new drug alters people’s heart rate. The trial was a double-blind trial 
and the control treatment was to give each person a placebo. The data from this 
trial are in Table 5. 
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Table 5 Heart rates of eight participants in a crossover trial 


Heart rate (beats per minute) 


Subject Drug Placebo 
1 90 105 
2 82 88 
3 95 90 
4 80 89 
5 88 80 
6 75 110 
7 83 84 
8 90 100 
(a) What type of data is involved? 
(b) What hypothesis test is appropriate for these data? 
(c) What distributional assumptions are necessary in order to use this test? 
(d) Write down the appropriate null and alternative hypotheses. 
(e) Carry out the test. What do you conclude? Does the new drug alter people’s 


heart rate? 


5 Drugs in society 


Much of the discussion in this unit so far has been about the effectiveness of a 
drug. Another, equally important, aspect of drug-testing is assessing the 
unwanted effects of drugs. In this section we shall describe some ways in which 
these are discovered, and look at the problems involved in ensuring that drugs 
are as safe as possible. 


5.1 Side effects 


The unwanted effects of drugs, side effects, can be of a number of different 
kinds. Here are some examples. 


e A drug may make people feel drowsy, suffer from hallucinations or have 
headaches. Such effects may be mild or serious. Even if they are only mild, 
however, they may cause patients to stop taking the drug and thus reduce its 
usefulness, especially if the disease itself is not very serious. 


e A drug may interfere with the function of particular organs in the body. It might 
speed up the heart, damage the liver or cause birth deformities. Many of the 
serious hazards of drugs are of this kind and may be permanent, but effects of 
this kind may also be milder. 
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‘Listew, when the side effects of this medication 
Rick in, you'll forget what was wrong 
tn the first place!’ 


Certain unwanted effects of drugs are common. For example, many people 
taking the pain-reliever codeine may become constipated. Other effects are rare. 
Clearly a non-serious, uncommon effect is of little importance. It is not at all 
worrying if, one in a thousand times, a drug causes a headache. But if a drug 
causes serious liver damage one in a thousand times, then it is most probably 
not a useful drug. The question of how important an unwanted effect is depends 
not only on the severity of the effect but also on the severity of the disease being 
treated. Severe liver damage as a consequence of a treatment for a headache 
would be unacceptable, whereas if a drug company developed a cure for rabies 
(which is almost invariably fatal once its symptoms become apparent), then even 
a treatment with a side effect which killed one in ten of the recipients would be 
considered a major advance. Whether or not the benefits of a new drug outweigh 
the risks is precisely what the experts at the EMA aim to establish in order to 
make their decision on a new drug approval. (Liver damage has indeed been 
linked to the over-the-counter pain-relief drug paracetamol, but only after 
overdose or constant long-term use.) 


Sometimes investigators in clinical trials have a pretty good idea of what side 
effects they expect from a new drug. There may be suggestions from preliminary 
experiments on animals that a particular effect may be a problem; other related 
drugs may show such an unwanted effect; or the standard treatment may have 
an undesirable side effect which, it is hoped, will not occur with the new 
treatment. In all these cases, it is necessary to devise a method of measuring 
the effect in question. 


This process is similar to that described in Subsection 2.2 for measuring the 
desired effects of drugs. 


Sometimes investigators have some idea of the effects for which they are 
looking, but do not know them exactly. It is well known that quite a lot of drugs 
have effects, such as headache, constipation and tremor symptoms, which are 
not uncommon in the population as a whole, whether or not drugs have been 
taken. In many Clinical trials the severity of such symptoms is entered on a 
checklist of symptoms which, it is known from past experience, are often 
connected with drug treatment. An example checklist is given below. 
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Symptom checklist 


Symptom Noticed in past 


two weeks? 


Severity of symptoms 
on 0-3 scale* 


Heart pounding 
Sleepiness 
Headache 

Sweating 

Dizziness 

Trembling 

Blurred vision 

Eye strain 

Difficulty passing urine 
Constipation 
Diarrhoea 

Nausea 

Vomiting 

Indigestion 

Lack of appetite 
Funny taste in mouth 
Dry mouth 

Difficulty in breathing 
Rash 

Itchiness 


*0 = absent, 1 = mild, 2 = moderate, 3 = severe 


‘| did't experience any of the side effects 
Listed tw the enclosed Literature. 
Should t be concerned?’ 


Very occasionally in clinical trials, serious unexpected adverse events occur. (An 
adverse event is a side effect that has a negative effect on a patient.) These are 
events such as death, life-threatening events, hospitalisation, birth defects and 
disablement. If it is probable that the unexpected adverse event was caused by 
the drug being tested, then investigators must report the event to the authorities. 
Throughout the EU, these are entered into a single electronic system called 
EudraVigilance. Investigators will consider whether the trial should be stopped. 
During the earlier phases of testing, investigators will stop the trial immediately if 
adverse reactions are severe. 


Example 9 Phase 1 trial of TGN1412 


An extreme and exceptional case occurred during 2006, when the very first 
phase 1 trial of a new drug, TGN1412, designed to target the immune system, 


had to be stopped. The drug was administered in doses 500 times lower than 
that found to be safe in animals. However, soon after the trial started, 

six volunteers were hospitalised: four had multiple organ failure and all six 
experienced cytokine release syndrome, which caused severe inflamation of the 
skin. Fortunately all the volunteers survived, the last being released from hospital 
after three months, though one had to have fingers and toes amputated and it 
was reported that they all remain at a long-term increased risk of developing an 
immune-system-related illness. The incident was put down to ‘unpredicted 
biological action in humans’. (See Suntharalingam et al., 2006 and Expert 
Scientific Group, 2006.) 


Example 10 Phase 2 trial of fialuridine 


In 1993, five patients died during a phase 2 trial of the new anti-viral drug 
fialuridine, designed to target hepatitis B. No sign of toxicity was detected during 
phase 1 trials, in which 67 subjects received fialuridine for two or four weeks. 
Then in the thirteenth week of the phase 2 trial, one patient suddenly developed 
hepatic toxicity (liver damage by chemical). The trial was stopped. Even after 
stopping the drug, seven patients went on to develop hepatic toxicity; five died 
and two survived after liver transplants. This severe delayed reaction could not 
have been predicted; fialuridine had gradually accumulated in liver DNA. (See 
McKenzie et al., 1995.) Any similar new drug would now be tested on animals for 
a longer period, to check for this type of damage. 


5.2 Unexpected side effects post-licence 


A completely different problem is presented by those side effects that are not 
discovered until after a drug has been licensed. A well-publicised example of this 
was the drug thalidomide (see Thalidomide Society, 2006). This was a sedative 
introduced in the UK in 1958. When taken by women in early pregnancy, it 
sometimes produced very severe effects on the physical development of the 
unborn children. A more recent case was the drug efalizumab, used to treat 
chronic psoriasis (red, scaly patches on the skin), which was withdrawn in 2009 
after some reports on fatal infections in long-term users; we shall return to this 
example later. Yet another historical example was the drug practolol, a 
beta-blocker, which shall now be described in more detail. 


Example 11 Withdrawal of practolol 


Practolol (marketed as Eraldin) was introduced in 1970 as a beta-blocker for the 
management of heart conditions. The particular advantage of practolol was that 
it appeared to have fewer unwanted effects than other beta-blockers available at 
that time. The first indications that the drug was not, after all, safe, came in two 
letters published in the British Medical Journal in 1974. The first, by Felix and lve 
(1974), described a characteristic rash that had developed in fourteen patients 
who had received long-term practolol therapy. The second letter, by Wright 
(1974), also reported a rash, but a few of his patients had also developed 
conjunctivitis (inflammation of the membrane that covers the front of the eye) — 
leading to severe and permanent visual impairment. Practolol seemed to be the 
common feature in all these cases. 


The company that marketed this drug, Imperial Chemical Industries (ICI), sent a 
letter to all doctors and pharmacists in the UK warning them of these possible 
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side effects, requesting information on similar cases, and advising immediate 
cessation of practolol therapy in any patient developing such symptoms. 


As more data were gathered, it became clear that the adverse reaction was 
definitely associated with practolol. It also appeared that similar reactions did not 
occur with other beta-blockers. Practolol was finally withdrawn from the market in 
October 1975. By 1981, there was a total of about 2450 reports from doctors 
about reactions to the drug, including 40 deaths, 1130 cases of eye damage and 
1250 skin reactions. (See also Abraham and Davis, 2006.) 


It is important, when judging the number of reported eye reactions in 
Example 11, to consider the amount of practolol that was used during the period. 
This type of amount is usually measured in patient-years. 


Patient-years 


If a single patient takes a drug for one year, then that constitutes one 
patient-year of use of the drug. 


If two patients take the drug for six months each, then the usage is half a 
patient-year each, which again comes to one patient-year. Assuming that 
the patients take the same dose of the drug each day, the amount of drug 
consumed by one patient taking it for a year is the same as the amount 
consumed by two patients over six months, so a patient-year corresponds 
to the use of a certain amount of the drug. 


If twelve patients each take the drug for a month, or if 365 patients each 
take it for a day, the total usage is still one patient-year. 


The total amount of practolol prescribed during the five years of its use amounted 
to approximately a million patient-years. The question that must be answered is 
not 


How can such occurrences with new drugs be prevented? 
but 
How can new drugs that have serious side effects be identified quickly? 


It is perhaps necessary first to justify the implication that such occurrences 
cannot be prevented (except by prohibiting all new drugs). A major factor is the 
length of time during which patients have to use a drug before the symptoms 
became apparent. With the exception of certain special cases, such as clinical 
trials of treatments for cancer, few clinical trials last for longer than a year. It is 
only when drugs are marketed that they are used for longer periods. One could 
argue that drugs should be tested in clinical trials for as long as it is proposed to 
use them in treatment. This would cause great delays in introducing drugs, but 
for some drugs this probably does not matter. Practolol, however, was a 
significant advance for many patients, because it provided genuine relief for 
people suffering from potentially fatal heart conditions. Few would wish to delay a 
drug that showed a good prospect of saving lives on the grounds of unlikely, 
although untested, possibilities of unknown risks. 


Another obvious factor is simply the rarity of the effect. If a drug causes severe 
damage to only one in a thousand of the people who take it, then quite a lot of 
people may have to take it before even the first case will be seen, and even more 
must take it before it can be known with any certainty that it is the drug that is 
causing the effect. 


Example 12 Withdrawal of efalizumab 


Efalizumab (marketed as Raptiva) was approved by the EMA in 2004 for the 
treatment of chronic psoriasis, a disease that attacks the immune system and 
causes red, scaly patches on the skin. Short-term clinical trials showed 
efalizumab to be safe, though there was a question over the risk of infection with 
long-term use. 


By the end of 2008, the drug manufacturers had made the EMA aware of a 
number of cases of serious infections in long-term users, most notably four 
long-term users who had developed a fatal brain infection (progressive multifocal 
leukoencephalopathy), three of whom had died. 


In February 2009 the CHMP met to review the risks and benefits of efalizumab; 
its benefits in the treatment of psoriasis were only modest, whereas there was a 
risk of serious side effects. The CHMP recommended suspension of the drug’s 
licence unless a subgroup of patients could be identified in which the benefits 
outweighed the risks. In May the drug licence holders voluntarily withdrew 
efalizumab, and in June the EMA withdrew the licence (EMA, 2009; see also 
DeFrancesco, 2009, Seminara and Gelfand, 2010). 


There are other factors that can easily prevent a relatively rare side effect being 
quickly detected. One is that not all the cases that occur are notified to the 
workers carrying out the follow-up; another is that the patients involved are often 
using more than one drug. 


Although it is vital to carry out thorough, well-designed clinical trials of drugs, and 
although statistical techniques can help to decide whether or not the drug is 
useful, these trials and statistical techniques cannot, by themselves, guarantee 
that a drug will be trouble-free when it is marketed. The greater the discrepancy 
between the conditions in which the clinical trials of a drug are conducted and 
the conditions in which the drug is actually used, the greater the chance that 
unpredicted side effects of the drug will appear. When testing a drug, the 
statistical analysis of data from clinical trials helps a drug company to decide 
whether to apply for a product licence to market that drug, but further data needs 
to be collected after marketing the drug. In order to monitor how the drug actually 
performs once it has been marketed, some form of post-marketing surveillance is 
needed. 


5.3. Post-marketing surveillance 


The pharmaceutical industry and national regulators throughout the EU submit 
all adverse drug reaction reports to EudraVigilance (the central electronic 
system). This includes both individual case safety reports, which are submitted if 
an individual patient has a serious adverse reaction to a licensed drugs in 
general use, and unexpected serious adverse reactions that occur during clinical 
trials of unlicensed drugs. The aim of such a system is to allow any potential 
drug safety issue to be detected and investigated as early as possible. 


Where necessary, these drug safety signals are referred to the EMA. Within the 
EMA there is a separate committee who look at all aspects of drug safety, called 
the Pharmacovigilance Risk Assessment Committee (PRAC). Their 
responsibilities include scrutinising referrals from the EudraVigilance system, 
monitoring the system’s effectiveness and maintaining a list of drugs that should 
be subject to additional monitoring. PRAC will make recommendations to the 
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CHMP and other committees on whether a drug’s licence should be changed or 
withdrawn. 


The trouble with a method like this is that doctors will not, generally, suspect an 
adverse drug reaction when they see something unfamiliar but will, quite rightly, 
send the patient to the appropriate specialist. So the reports submitted to 
EudraVigilance will tend to be biased towards the symptoms that doctors expect 
to see as adverse reactions. So a less expected reaction is likely to be missed 
until a sharp-witted specialist, who sees enough cases, notices that more 
patients suffering from a particular condition are taking a particular drug than 
would be expected. In the case of the connection of both rash and conjunctivitis 
with practolol, there were distinct, unfamiliar features to alert a specialist that 
something unusual was happening. The situation is much more difficult when a 
drug causes a disorder that is common in the whole population. 


How can the situation be improved? All, or a sample, of the patients taking the 

drug for a period after the drug comes onto the market can be monitored. Such 
studies are sometimes called phase 4 trials, and it is a further responsibility of 

PRAC to assess and evaluate these population-based studies. 


Example 13 Withdrawal of rofecoxib 


Rofecoxib (marketed as Vioxx) is a drug that was used to target arthritis and 
other pain-causing conditions. It was withdrawn after population-based studies 
had shown an increased risk of heart attack and stroke. 


The drug came onto the market in 1999. A study was published in 2000, 
comparing the effectiveness and side effects of rofecoxib and another drug used 
to treat arthritis, naproxen (Bombardier et al., 2000). It found that the incidence of 
heart attacks was four-fold higher in the rofecoxib group. Rofecoxib’s 
manufacturer, Merck Sharp & Dohme, responded by claiming that the difference 
was due to naproxen having a protective effect on heart attacks. 


In 2000, Merck commenced a three-year study whose primary aim was to 
assess the effectiveness of rofecoxib for a new purpose, but it had the additional 
aim of assessing cardiovascular risk. It was found (Bresalier et al., 2005) that the 
rate of a serious cardiovascular event such as a heart attack or stroke, after 

18 months on rofecoxib, was 1.71 events per 100 patient-years versus 

0.38 events per 100 patient-years for study participants taking a placebo. The 
drug was voluntarily withdrawn from the worldwide market by Merck in 2004. 
(See also Dieppe et al., 2004.) 


How are population-based studies carried out? One method is to select a group 
of patients taking the drug, and to follow them up by means of regular medical 
checks, interviews and records of hospital admissions. Such a group is called a 
cohort. A control group (cohort) may also be selected, which is matched with 
this group in as many respects as possible, but those in the control cohort are 
not taking the drug. 


Activity 22 Selecting groups for post-marketing studies 


The procedure for selecting cohort groups in post-marketing studies is similar to 
the selection of groups for a Clinical trial. Is the selection procedure identical to 
that used in a clinical trial? If not, in what important ways is it different? 


6 Case study 


Selecting experimental and control groups after, rather than before, they have 
received a treatment is unsatisfactory because many factors other than the drug 
treatment will be different for the two groups, though investigators will try to take 
these factors into account. This is also a very expensive method of surveillance, 
and studies may take a long time to carry out; for example, it took four years 
before rofecoxib was withdrawn. Depending on the size of the population 
monitored, rare side effects may still go undetected. Other study designs are 
available that can be used more economically to study rare side effects. 


Another approach, called record linkage, consists of obtaining access to a 
patient’s medical record so that links, other than those already known, between 
various ailments and the drugs being taken can be spotted. This approach does 
not require doctors to actively report information on their patients, but only 
requires them to permit researchers to have access to the records. For this 
reason it has many attractions, but it also raises the highly controversial issue of 
the privacy of medical records. 


None of these methods can completely prevent damage to patients from the 
unexpected actions of drugs. All they can do is reduce the time it takes to spot 
the effect so that the drug can be withdrawn, or its use limited, as soon as 
possible. 


6 Case study 


The following case study describes the progress of a particular drug through the 
different phases of test. Do not worry if you do not follow all the medical details. 


Patients who have just had a hip or knee replacement are at high risk of blood 
clots forming in the veins of their legs, which can be dangerous if the clot moves 
to a major organ such as the lungs. Likewise, patients with an abnormal 
heartbeat, called ‘atrial fibrillation’, have a high risk of blood clots which can 
cause a stroke. Patients at high risk of blood clots are prescribed anticoagulant 
drugs, which prevent blood from clotting. The main safety concern with any drug 
that prevents blood from clotting is that patients may bleed excessively. Clinical 
testing of any new anticoagulant needs to establish an acceptable trade-off 
between blood clot prevention and bleeding. 


‘ g m5 


mie 


Dabigatran (marketed as Pradaxa) is one such anticoagulant. Dabigatran was 
granted marketing authorisation by the European Commission in 2008 after 
positive evaluation by the CHMP (EMA, 2008, 2013). We will describe all the 
clinical studies carried out in humans before the drug was approved. 


6.1 Phase 1 of testing dabigatran 


In phase 1, the drug is usually given in increasing doses to healthy 
volunteers so as to evaluate biological action and safety. 
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After pre-clinical testing, which included testing on rats and rabbits, two phase 1 ¥ OR 
studies with healthy male volunteers were carried out. (See Stangier et al., 
A blood clot at the centre of a 
2007.) 
blood vessel 


In the first study, 40 subjects were randomised to one of five groups and, within 
each group, subjects were randomised so that two received placebo and six 
received a single dose of dabigatran. The first group received one 10 mg dose; in 
the other four groups, the single dose was, respectively, 30 mg, 100 mg, 200 mg 
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An X-ray image of a patient after 
a hip replacement 
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and 400 mg. Blood samples were taken from patients to observe the rate at 
which the drug was eliminated from the body. 


In the second study, again with 40 volunteers, doses were given three times daily 
for seven days. The first group received doses of 50 mg and subsequent groups 
were given higher doses up to 400 mg. 


At a dose of 400 mg three times a day, some volunteers bruised where they had 
been punctured by needles, and some had bleeding gums. The main safety 
concern was the occurrence of a major bleed. Major bleeds are classified 
according to a strict definition; any bleeding not classified as a major bleed is 
considered to be a minor bleed. No major bleeds were observed during the 
phase 1 testing. 


mu ePrt;' 


‘That’s great, but it was supposed to be a Laxative.’ 
6.2 Phase 2 of testing dabigatran 


The studies in phase 2 are usually the first studies in which the drug is 
given to patients with the condition that the drug is designed to help. 


The aims of the first study in patients (called BISTRO 1) were to check the drug’s 
effectiveness, form hypotheses about the optimal dose and assess the drug’s 
safety, in particular with respect to bleeding. 


The study was conducted over nine months across 11 sites in Sweden and seven 
in Norway. (See Eriksson et al., 2004.) Dabigatran was given for six to ten days 
to patients after a hip replacement operation, with patients divided into groups 
according to when they entered the study. The first group to enter the study 
received two doses of 12.5 mg each day and the dose was steadily increased for 
subsequent groups. The ninth group received two doses of 300 mg each day. 


Guidelines were carefully set up in advance with regards to stopping the study if 


the number of patients with bleeding problems became too high as dose size 
increased. 


e All major bleeds were recorded, and if 5% or more of patients experienced 
major bleeds the study would be stopped. 


e Any other bleeding, no matter how small, was recorded as a minor bleed. 
However, some bleeding from the surgical hip replacement site was expected, 
so only excessive bleeding there was recorded. 


The other issue was blood clots, which the drug was meant to prevent. Under 
guidelines for this, the dose level was to be increased for the next group of 
patients to the next dose up if 20% or more of patients who received the current 
dose experienced a blood clot. 


There were 314 patients enrolled on the study, with 289 receiving at least one 
dose and 262 patients completing the study, including a follow-up check four to 
six weeks after surgery. Although no patients experienced a major bleed, the 
study was stopped at a dose of 300 mg twice daily because two patients 
experienced minor bleeding from multiple sites. Drug safety and bleeding was 
evaluated in all 289 patients who received at least one dose, but data to assess 
drug effectiveness in preventing blood clots were only available for 225 patients. 


Some results are given in Table 6. It can be seen that few patients had a blood 
clot, but minor bleeds were common. The total number of events is given in the 
last row. 


Table 6 Number of bleeds and clots in the dabigatran phase 2 trial 


Dose regime Data on bleeds Data on clots 
Level Times No. of Minor No. of | Clots 
(mg) per day patients bleeds patients 
12.5 2 27 2 24 5 

25 2 28 9 21 2 
50 2 30 18 27 4 
100 2 40 33 31 6 
150 1 41 39 33 3 
150 2 29 26 21 2 
200 2 28 22 21 4 
300 1 46 41 33 2 
300 2 20 16 14 0 
Total 289 206 225 28 


Example 14 Does dosage effect the number of minor bleeds? 


As there were no major bleeds, obviously there is no evidence that the dose 
regime affects the chance of a patient having a major bleed, but there is the 
question of whether daily dose level affects the number of minor bleeds. The 
numbers are quite small, so to examine this we will combine results into three 
categories: 


e total daily dose less than or equal to 100 mg 
e total daily dose 150 mg or 200 mg 
e total daily dose greater than or equal to 300 mg. 


Thus, for the first category there are 27 + 28 + 30 = 85 patients, of whom 
2+9-+ 18 = 29 had minor bleeds, so that 85 — 29 = 56 patients had no bleed. 
Results for the three groups give the following contingency table: 
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Table 7 Number of patients with minor bleeds and number without minor bleed, by 
daily dose level 


Total daily dose Total 


<100mg 1500r200mg > 300mg 


With minor bleed 29 72 105 206 
Without bleed 56 9 18 83 
Total 85 81 123 289 


The null and alternative hypotheses are: 


Hg: Total daily dose of drug and the number of patients having 
a minor bleed are independent. 


HA: There is a relationship between the total daily dose of drug 
and the number of patients having a minor bleed. 


Recall from Subsection 4.2 in Unit 8, the Expected values are calculated from 


B= Row total x Column total 
i Overall total : 


which gives the following Expected table. 


Total daily dose 


With minor bleed 60.5882 57.7370 87.6747 
Without bleed 24.4118 23.2630 35.3253 


Note that the Expected values are greater than 5, so it is appropriate to use the 
2 
x~ test. 


The Residuals are obtained from 
Residual = Observed — Expected. 


Thus, for example, the Residual for the first cell is 29 — 60.5882 = —31.5882. 
The following is the Residual table. 


Total daily dose 


With minor bleed —31.5882 14.2630 17.3253 
Without bleed 31.5882 —14.2630 —17.3253 


For the first cell, the contribution to x? is given by, 
(Residual)? 
Expected 
(—31.5882)? 
= +—__—_ ~_ 16.4688. 
60.5882 
Repeating the calculation for all six cells results in this y? table: 


x? contribution = 


Total daily dose 


With minor bleed 16.4688 3.5234 3.4236 
Without bleed 40.8743 8.7449 8.4972 


Hence the value of the x7 test statistic is 
16.4688 + 3.5234 + 3.4236 + 40.8743 + 8.7449 + 8.4972 ~ 81.532 


The number of degrees of freedom for a 2 x 3 contingency table is 
(2—1) x (83-1) =2. 
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Hence, from Table 3 (Exercises on Section 4), the critical value at the 5% 
significance level is 5.991, and at the 1% significance level it is 9.210. 


Since the test statistic, 81.532, is much greater than the 1% critical value, 9.210, 
we reject Ho in favour of H, at the 1% significance level and conclude that there 
is strong evidence of a relationship between the total daily drug dose and the 
number of patients having a minor bleed. Looking at the data in Table 7 (or at the 
Residual table), bleeds seem less likely when the daily dose is 100 mg or less 
compared with when it is 150 mg or more. 


Activity 23. Does dosage effect the number of blood clots? 


One purpose of phase 2 is to start to learn whether the drug is effective. The 
purpose of dabigatran is to reduce the risk of blood clots. Table 6 gives data on 
the number of blood clots by dose regime. Combine the dose categories into 
three groups, using the same dose regime groups as in Example 6. Forma 
contingency table appropriate for testing whether the total daily dose of drug and 
the number of patients having a blood clot are independent. Perform the test and 
report your conclusion. 


Phase 2 testing continued with a larger study, called BISTRO II, that shared the 
same aims as the BISTRO | study (Eriksson et al., 2005). 


It was a double-blind randomised controlled trial of 1973 patients who had just 
undergone a hip or knee replacement across 60 centres in Europe and two 
centres in South Africa. Given the inconclusive results of the BISTRO | study and 
the need for sensible dosing schedules to be determined for phase 3 testing, 
patients in the treatment group were randomised to one of four dosing 
schedules: 50 mg twice a day, 150 mg twice a day, 300 mg once a day and 

225 mg twice a day. Patients randomised to the control group were given an 
existing treatment proven to reduce the risk of blood clots, enoxaparin. 


Activity 24 Use a placebo as the control treatment? 


Why would it be unethical to compare dabigatran with placebo? 


Activity 25 Double-blind trial — How? 


Dabigatran is given orally in capsule form, while enoxaparin is given as an 
injection. How do you suppose it was possible to achieve blinding? 
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Figure 10 


(a) A drug in capsule form; (b) an injection of a drug 


Activity 26 Questions of interest? 


For each question that you identify in (a) and (b), state the null hypothesis that 
should be tested. 


(a) What are the main questions involving only dabigatran that should be 
examined after the data have been gathered? 


(bo) What are the main questions to examine in comparing dabigatran with 
enoxaparin? 


A summary of some of the main results is given in Table 8. It shows the number 
of patients experiencing major or minor bleeds and the number experiencing 
blood clots. Data on bleeds relate to all patients who received at least one dose 
of a drug in the trial — just under 400 patients for each drug/dosage regime. Data 
on the number of clots determine the efficacy of a treatment and are only given 
for those patients who completed the trial — about 300 patients for each 
drug/dosage regime. 


Table 8 Number of bleeds and clots with dabigatran and enoxaparin in BISTRO II 


Drug, dose Data on bleeds Data on clots 

and daily 

frequency No. of | Major Minor No.of Clots 
patients bleeds bleeds patients 

Dabigatran 389 1 18 302 86 

50 mg: twice 

Dabigatran 390 16 31 282 49 

150 mg: twice 

Dabigatran 385 18 37 283 47 

300 mg: once 

Dabigatran 393 15 38 297 39 

225 mg: twice 

Enoxaparin 392 8 25 300 72 

40 mg: once 

Total 1949 58 149 1464 293 


Statistical analysis showed that, compared with enoxaparin, the rate of 
occurrence of clots was significantly lower with dabigatran when it was 
administered at 150 mg twice daily (p = 0.04), 300 mg once daily (p = 0.02) or 
225 mg twice daily (p = 0.0007). Looking at Table 8, there is a suggestion that 
taking dabigatran at these higher doses increases the risk of a major bleed, 
compared with enoxaparin, but in fact the differences are not statistically 
significant at the 5% significance level. 


When administered in twice-daily doses of 50 mg, dabigatran had significantly 
lower rates of major bleeds than at other levels, and a significantly lower rate 
than enoxaparin. However, at that dosage dabigitran seemed no better (and 
possibly worse) than enoxaparin at reducing the rate of clots. 


The conclusion of the BISTRO II study team was that, with dabigatran, both the 
effectiveness (reduction in the rate of clots) and safety (rate of bleeding events) 
depended on the dose level. They also concluded that the three highest doses of 
dabigatran were significantly more effective than enoxaparin at reducing the rate 
of blood clots, although at these levels there appeared to be an increase in 
bleeds. 


6.3. Phase 3 of testing dabigatran 


In phase 3, treatment with the new drug is compared with existing therapies 
in a wider range of contexts. 


There were three large phase 3 studies initiated in 2004, with results published in 
2007. Their study set-up and aims were similar to the BISTRO II trial: patients 
who were undergoing hip or knee replacement were randomised to the treatment 
group, receiving dabigatran, or the control group, receiving enoxaparin, and 
effectiveness and safety outcomes, particularly bleeding, were compared. Given 
the study team’s conclusions on dosing after the BISTRO II trial, dabigatran was 
administered once per day at a dose of either 150 mg or 220 mg. 


One of the three studies, called RE-MOBILIZE, was carried out in North 
America, where the recommended dose of enoxaparin after hip or knee 
replacement is 60 mg/day, different from the 40 mg dose used throughout Europe 
(RE-MOBILIZE Writing Committee, 2009). The results of the RE-MOBILIZE trial 
were of less relevance to the CHMP in deciding whether dabigatran should be 
licensed within Europe, so will not be described further here. 


In one of the other two trials, called RE-MODEL, a total of 2101 knee 
replacement patients located at one of 105 centres in Europe, Australia or South 
Africa were randomised to one of the two treatment groups (150 mg or 220 mg 
dabigatran) or the control group (40 mg enoxaparin) (Eriksson et al., 2007a). 


In the third trial, called RE-NOVATE, a total of 3494 hip replacement patients 
located at one of 115 centres in Europe, Australia or South Africa were 
randomised (Eriksson et al., 2007b). 


Results of the latter two trials are given in the tables below. For instance, in the 
220 mg dose of dabigatran group in the RE-MODEL trial (Table 9), the numbers 
10/679 mean that, of the 679 patients for whom data are available, 10 of these 
had a major bleed. There were fewer data for blood clots than for the safety 
outcomes, because for some patients the relevant data were not collected or 
were inadequate. 
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Table 9 Number of bleeds and clots or death with dabigatran and enoxaparin in 
RE-MODEL (knee replacement) 


Drug and dose 


dabigatran enoxaparin 
220mg 150mg 40mg 
Major bleeds 10/679 9/703 9/694 
Minor bleeds 60/679 59/703 69/694 
Blood clots or death 183/503 213/526 193/512 


Table 10 Number of bleeds and clots or death with dabigatran and enoxaparin in 
RE-NOVATE (hip replacement) 


Drug and dose 


dabigatran enoxaparin 
220mg 150 mg 40mg 
Major bleeds 23/1146 15/1163 18/1154 
Minor bleeds 70/1146 =72/1163 74/1154 
Blood clots or death 53/880 75/874 60/897 


In Tables 9 and 10, the proportions of people with blood clots or who died are 
slightly lower with the 220 mg dose of dabigatran than with enoxaparin. For 
example, in the 220 mg dose of dabigatran group of the RE-MODEL trial, 

183 patients out of 503 patients had a blood clot or died (though only 1 patient 
died in this treatment group), corresponding to 36.4% of patients. In the 
enoxaparin group, 193 out of 512 patients had a blood clot or died (there was 
also only 1 death in this group), corresponding to 37.7% of patients. Also, the 
proportions of people with bleeds are slightly lower with the 150 mg dose of 
dabigatran than with enoxaparin. However, the study investigators found no 
statistically significant difference between the effectiveness or safety of either 
dose of dabigatran and enoxaparin. 


In addition to the three phase 3 studies mentioned above, a further phase 3 
study, called RE-LY, looked at the effectiveness of dabigatran in preventing stroke 
in patients with atrial fibrillation (a heart condition which, as mentioned right at 
the start of this case study, is one where it was hoped dabigatran would be 
useful). (Connolly et al., 2009, and see also correction in Connolly et al., 2010.) 
This study, initiated in 2005, was a very large worldwide study in which 18113 
atrial fibrillation patients were randomised to one of two dabigatran dosing 
schedules or to a control group. The dabigatran dosing schedules were 110mg 
or 150 mg, twice daily, while the control group were given the most commonly 
used existing anticoagulant treatment for this condition, warfarin. The study 
design was like a cohort study: patients were followed over time, for a median of 
two years, with several follow-up visits initially, then every four months until the 
end of the study. Treatments were not blinded. Data were collected on all 
medical conditions, however minor. 


The main aim of the study was to evaluate the effectiveness of dabigatran at 
reducing the risk of stroke, and to look at the safety of long-term use of the drug 
with a particular emphasis on bleeding. The trial took several years to run, and 
results were published in 2009. We will not show a table of the results here as 
they are more difficult to interpret: the length of time in the study differed between 
patients, so person-time had to be taken into account. The main findings were: 


e At adose of 110mg twice daily, dabigatran was as effective as warfarin in 
reducing the risk of stroke, with a lower risk of major bleeds. 


e At adose of 150 mg twice daily, dabigatran was more effective than warfarin in 
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reducing the risk of stroke, with a similar overall risk of major bleeds. The risk 
of any type of bleeding was lower, but the risk of gastro-intestinal (stomach 
and intestine) bleeding was higher. 


6.4 Marketing authorisation and use 


During 2007, the CHMP considered the evidence from the RE-MODEL and 
RE-NOVATE trials in deciding whether or not to recommend dabigatran for 
licensing in Europe. The CHMP concluded that dabigatran was as effective as 
enoxaparin in preventing blood clots, and safety profiles were similar. It was 
noted that dabigatran was more convenient for patients than enoxaparin because 
it is taken orally, rather than given as an injection. 


The CHMP considered that the benefits of dabigatran outweighed its risks and 
recommended to the European Commission that it be given marketing 
authorisation for use after hip and knee replacement surgery. This was granted 
in March 2008. After completion of the RE-LY study, marketing authorisation for 
stroke prevention in patients with atrial fibrillation followed in August 2011. 


Dabigatran is convenient to use as it can be taken orally in a capsule that acts 
immediately and requires no monitoring. Partly for this reason, it is considered to 

be an important new anticoagulant. Two alternative anticoagulants have been ea 
mentioned, enoxaparin and warfarin. Enoxaparin is available only as an injection, ‘ant Pradaxa 
making it a more invasive treatment. Warfarin is taken orally, but its use requires Cape ae 
an initial phase of normalisation which requires frequent blood tests. Standard 

practice is to give patients an anticoagulant for several weeks after hip or knee 
replacement, but their hospital stay might be less than five days. It follows that 
dabigatran is more convenient for both patients and medical staff. 


Swallow capsule whole 


Usual Dosage: See package 
insert for dosage information 


60 capsules 


The cost-effectiveness of dabigatran for use after hip or knee replacement was Pradaxa | 
i ‘ é ‘ 4 ‘s (dabigatran etexilate) 
reviewed by NICE during 2008. While dabigatran is a more expensive drug than Case 
other existing treatments, its convenience means it takes up less Clinical time. IM a» O28" veel 


NICE approved dabigatran as an available option for preventing blood clots after eer 
hip or knee replacement surgery on the NHS (NICE, 2008). It also recommended 
further clinical trials to compare dabigatran with other existing anticoagulants. 


6.5 Surveillance 


Phase 4 studies use larger samples of patients than can be obtained before 
marketing. They aim to obtain further evidence about the safety of the drug. 


At the end of 2011, the EMA published a press release (EMA, 2011) on the 
safety of dabigatran. There were worldwide reports of bleeding on the drug, 
which in a few cases led to death. Excessive bleeding is a well-known risk with 
any anticoagulant, and evidence from the trials suggested dabigatran had a 
similar safety profile to other anticoagulants. It was noted that with increasing 
worldwide use and awareness, the total number of adverse events reported 
tends to increase. However, the EMA recommended that further checks of a 
patient’s health should be carried out before the patient is prescribed dabigatran. 
The EMA will continue to monitor the safety of the drug. 
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Exercises on Section 6 


Exercise 8 Analysing the RE-NOVATE trial 


Table 10, which gave results of the RE-NOVATE trial, is reproduced below. Carry 
out a yx? test to test the null hypothesis of no difference between effectiveness in 
reducing the risk of blood clots (or death) when treated with 220 mg dabigatran 
and 40 mg enoxaparin. Start by forming a suitable 2 x 2 contingency table. You 
may carry out your calculations either by hand or using Minitab. 


Drug and dose 


dabigatran enoxaparin 
220mg 150 mg 40mg 
Major bleeds 23/1146 15/1163 18/1154 
Minor bleeds 70/1146 =72/1163 74/1154 
Blood clots or death 53/880 75/874 60/897 


¢ Computer Book: clinical trials 


In Section 3 you learnt about crossover trials, matched-pairs trials and 
group-comparative trials. Chapter 11 of the Computer Book tells you how to use 
Minitab to randomise patients to treatments for these clinical trials. It then gives 
you more practice at using Minitab to carry out x? tests and t-tests. 


You should work through all of Chapter 11 of the Computer Book now, if you 
have not already done so. 


Summary 


In this unit we looked at the testing of new drugs. The tests have two main aims: 
to determine whether a drug is an effective treatment and to make sure it is safe, 
with no serious adverse side effects. New drugs are tested in clinical trials that 
use a control group. 


Section 2 explained the difficulties of setting up effective controls, and how some 
of these difficulties can be overcome by using placebos. It distinguished 
subjective measurements, such as sensations, from objective measurements, to 
which relatively precise numerical values can be attached. Bias can arise in 
experiments from the expectations of both the experimenters and the subjects, 
but it can be removed by blind and double-blind experiments. Problems also arise 
from the variability of the experimenters and of the people who are the subjects. 


Different types of clinical trial were described in Section 3. One type is the 
crossover trial, where a patient receives both the control (placebo) treatment in 
one time period and the new drug in a different time period. A second type is a 
matched pairs trial, where each person given the new drug is matched with a 
second person who is given the control treatment. Both these forms of trial aim 
to remove some of the variability between people, either by using each person as 
his or her own control, or by matching people with similar attributes. A third type 
of clinical trial is the group-comparative trial. Here people are allocated to the 
control group and treatment group at random, subject to the restriction that each 
group should be representative of the population being studied. 


Section 4 showed that some medical conditions and ailments require categorical 
measurements, others ordinal measurements and others are measured on an 
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interval scale. Different statistical tests need to be used for data from these 
different types of measurement. This unit illustrated the use of t-tests and y? 
tests in clinical trial analysis. 


The problems of detecting the side effects of a drug were discussed in Section 5. 
Some side effects are expected and these can be measured by experimental 
procedures similar to those used to measure the desired effects of a drug. Other 
side effects may not be expected, as in the case of practolol, a drug developed 
for treating heart conditions which turned out to damage the eye. Clinical trials 
are necessarily limited in scale and duration, so they may not reveal all the side 
effects of a drug. Post-marketing surveillance is therefore important, and this 
section described various methods of post-marketing surveillance, together with 
problems that are involved in each. 


Section 6 worked through all the major stages in the testing of a new drug. 
Perhaps the overriding messages from this case study are that study designs 
can be complex and results may not be clear-cut. Also, the process of drug 
licensing is slow — first reports of dabigatran were published in 2001, yet it took 
until 2008 to license the drug. 
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Learning outcomes 


After working through this unit, you should be able to: 


understand the need for controls in a clinical trial and how to control for certain 
factors 


explain what is meant by a placebo and explain how and why placebos are 
used in clinical trials 


explain some of the ethical questions that have to be answered when 
assessing whether a group of patients can be given a placebo 


distinguish between subjective and objective measurements in clinical trials 


identify sources of bias in clinical trials and suggest experimental procedures 
whereby bias can be reduced 


identify sources of variability in clinical trials and recognise when this variability 
can hamper the interpretation of the results from a clinical trial 


describe crossover, matched-pairs and group-comparative designs of clinical 
trials 


recognise those investigations in which each of these designs should be used 
and those in which they should not 


understand some uses of random methods of allocation in the design of an 
experiment, and their importance 


understand that the design and analysis of a clinical trial (or any scientific 
experiment) are closely connected 


distinguish between categorical, ordinal and interval scale measurements, and 
understand their uses in Clinical trials 


analyse categorical data from trials using the 7 test 
analyse interval scale data from trials using the t-test 
explain why clinical trials may fail to detect all the side effects of a new drug 


describe some methods of post-marketing surveillance, together with their 
advantages and drawbacks. 
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Solutions to activities 


Solution to Activity 1 


The answer is nothing at all. Headaches tend to go away sooner or later anyway. 
Perhaps your headache would have been gone in an hour even if you had not 
taken the drug. 


Solution to Activity 2 


You cannot say much more than in the previous activity. You are more likely to 
believe that aspirin really works if 16 out of 20 headaches improve rather than if 
none improve, but you still do not know how many of the headaches would have 
got better anyway. 


Solution to Activity 3 


Each member of the experimental group has taken a pill, wnereas each member 
of the control group has not. This may appear to be a trivial difference, but it is 
not. Every doctor knows that the mere fact of taking a pill can have beneficial 
effects, irrespective of any active ingredients that the pill may contain. 


Solution to Activity 4 


The control group can be given dummy tablets which look like aspirin and taste 
like aspirin but contain no active drug. 


Solution to Activity 5 


Scurvy is a serious illness that can result in death. Hence it would not have been 
ethical to give a placebo — all patients needed to be treated. 


Solution to Activity 6 
Some of the problems are as follows. 


e With either treatment, the headache may take more than an hour to go away, 
but last longer when the placebos are taken instead of the aspirin. 


e With either treatment, the headache might be gone after an hour. However, the 
headache may have gone away in 5 minutes with the aspirin and in 45 minutes 
with the placebos, for example. 


e The headache may be still there but may not be as severe with the aspirin as 
with the placebos. 


Solution to Activity 7 


It probably does not. If a drug gives only a degree of relief which is not easily 
noticeable, then it will not be very useful. Thus it is only quite substantial 
differences that are likely to be of interest. 


Solutions to activities 


Solution to Activity 8 


It certainly does matter, because the people will expect the active tablet to work 
and the dummy tablet to be ineffective. Thus their expectations of the treatment 
will be quite different in the two cases. So it is important to make sure, as far as 
possible, that the subjects do not know which treatment they are receiving. 


Solution to Activity 9 


It still matters, because the doctors also have expectations. They will probably 
expect the active treatment to be better than a placebo, and their expectations 
can often affect how the patients respond to the treatment. 


Solution to Activity 10 


If all treatments were equal, you would expect the two patients given sea-water 
to take longer to recover from the scurvy, so the results from this group may 
appear worse than the other treatment groups. 


Solution to Activity 11 


An independent person could have assigned and given the treatments in a 
private space. Then James Lind would have been blinded to the treatments. 
Blinding patients to what treatment they themselves received would not have 
been possible, but patients could have been blinded to the treatments that other 
patients received. 


Solution to Activity 12 


It would be unsatisfactory because people are variable. Not only do different 
people vary in the extent to which aspirin reduces their headaches, but an 
individual’s response to aspirin can vary from one time to another. For example, 
perhaps with some people the aspirin produces no relief if the headache is very 
severe, but banishes it altogether if the headache is slight. 


Solution to Activity 13 


Measurements should be taken at specific times before and after eating. In 
studies of drugs designed to control glucose levels in diabetics, it is normal to 
take blood glucose or similar measurements after fasting as a baseline. 


Solution to Activity 14 


The measurements for men and women are very similar, so there is little need to 
ensure that the control and experimental groups are of the same sex. However, 
the maximum heart rate decreases quite noticeably with age, so it would be 
important to ensure that the experimental and control groups were either of 
similar age or contained similar proportions of old and young people. 
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Solution to Activity 15 


It assumes that the effect of the drug is reversible, i.e. that the patient is 
essentially the same after the treatment has been withdrawn as before it was 
started. In practice, this assumption would need to be tested and it will not hold 
if, for instance, the patient is cured by the treatment. 


Solution to Activity 16 


(a) A crossover trial can be used here. Diabetes is a chronic condition that 
patients will have for the duration of the trial. 


(b) It would not be appropriate to use a crossover design here as scurvy could 
be cured (or have resulted in death) prior to receiving the second diet 
supplement. 


Solution to Activity 17 


Matching first on sex, then smoking status, then age, gives the following pairs: 1 
and 5, 2 and 4, 3 and 7, 6 and 8. If the relative importance of sex, smoking status 
and age changes, these matched pairs will change. 


Solution to Activity 18 


By allocating patients to the two groups at random. 


Solution to Activity 19 


The null hypothesis would be that the effect of the new drug is not different from 
the effect of the existing drug. The alternative hypothesis would be that the effect 
of the new drug is different from the effect of the existing drug, i.e. that it relieves 
headache pain either more or less effectively. 


Solution to Activity 20 


A two-sided test would be more appropriate. There is a chance that the existing 
drug will be better than the new drug (the existing drug has presumably proved 
better than competitive drugs in the past), but the drug company hopes that the 
new drug will be the better drug. They wish to detect differences between the 
new and the existing drug in both directions. Therefore a two-sided test should 
be used. 


Solution to Activity 21 


(a) Body temperature can be measured on an interval scale, giving interval 
scale data. 


(b) As pain can be rated (from mild to severe), these are ordinal data. 


(c) There are (two) categories, pregnant and non-pregnant, so the information 
gives categorical data. 


(d) Blood pressure measurements give interval scale data. 


Solution to Activity 22 


Yes, there is a major difference in the two selection procedures. In 
post-marketing surveillance, people are allocated to groups after the decision is 
taken as to which patients should be treated with the particular drug. 


Solution to Activity 23 


Combining results for the three groups gives the following contingency table: 


Total daily dose 
<100mg 1500r200mg > 300mg Total 


Blood clot 11 9 8 28 
No blood clot 61 55 81 197 
Total 72 64 89 225 


The null and alternative hypotheses are: 


Hg: Total daily dose of drug and the number of patients having 
a blood clot are independent. 


A,: There is a relationship between the total daily dose of drug and 
the number of patients having a blood clot. 


The Expected table is as follows: 


Total daily dose 


Blood clot 8.9600 7.9644 11.0756 
No blood clot 63.0400 56.0356 77.9244 


Note that the Expected values are greater than 5, so it is appropriate to use the 
2 
x* test. 


Residual = Observed — Expected, so the following is the Residual table. 


Total daily dose 


Blood clot 2.0400 1.0356 —3.0756 
No blood clot —2.0400 —1.0356 3.0756 


For the first cell, the contribution to y? is given by: 
(Residual)? 
Expected 
= (2.0400)° ~ 0.4645. 
8.9600 
Repeating the calculation for all six cells results in this 7 table: 


x7 contribution = 


Total daily dose 


Blood clot 0.4645 0.1347 0.8541 
No blood clot 0.0660 0.0191 0.1214 


Hence the value of the x” test statistic is 
0.4645 + 0.1347 + 0.8541 + 0.0660 + 0.0191 + 0.1214 ~ 1.660 


As in Example 14, the number of degrees of freedom is (2 — 1) x (3-1) =2 
so, from Table 3 (Exercises on Section 4), the critical value at the 5% significance 
level is 5.991 and at the 1% significance level it is 9.210. 


Since the test statistic, 1.660, is less than the 5% critical value, 5.991, we cannot 
reject Hp at the 5% significance level. Hence we conclude that the data provide 
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little evidence of a relationship between the total daily drug dose and the number 
of patients getting a clot. 


Solution to Activity 24 


As noted at the start of this section, patients who have undergone a hip or knee 
replacement are at high risk of blood clots. Blood clots can be dangerous so it 
would be unethical to give such patients a placebo when there are known, 
established treatments for reducing the risk. 


Solution to Activity 25 


Patients in the treatment group were given a placebo injection in addition to the 
dabigatran capsule, and patients in the control group were given a placebo 
capsule in addition to the enoxaparin injection. 


Solution to Activity 26 


(a) The same questions that were examined in BISTRO | should be examined in 
BISTRO II. Thus the questions to examine are whether the daily dose of 
drug affects the number of patients having (i) a minor bleed, (ii) a major 
bleed and (iii) a blood clot. 


BISTRO | found evidence that the number of patients having a minor bleed 
was affected, so we would expect BISTRO II to replicate that result. 

BISTRO II is a larger trial than BISTRO | so there are more likely to be major 
bleeds. (There were none in BISTRO I.) Also, because it is a larger trial it 
may find evidence that drug dosage affects the number of patients suffering 
a blood clot. 


The null hypotheses would be: 


(i) Ho: Total daily dose of dabigatran and the number of patients having a 
minor bleed are independent. 


(ii) Ho: Total daily dose of dabigatran and the number of patients having a 
major bleed are independent. 


(iii) Ho: Total daily dose of dabigatran and the number of patients having a 
blood clot are independent. 


(b) The primary aim in comparing the two drugs is to discover whether 
dabigatran is better than enoxaparin. That is, is there a daily dose level of 
dabigatran for which it is better than enoxaparin? Thus, for the first dose 
regime, the question is whether taking 50 mg doses of dabigatran twice a 
day is better than taking enoxaparin. Here, ‘better’ relates to the number of 
patients having (i) a minor bleed, (ii) a major bleed and (iii) a blood clot. 


The corresponding hypotheses are: 


(i) Ho: The drug taken (50 mg of dabigatran twice a day or enoxaparin) and 
the number of patients having a minor bleed are independent. 


(ii) Hg: The drug taken (50 mg of dabigatran twice a day or enoxaparin) and 
the number of patients having a major bleed are independent. 


(iii) Ho: The drug taken (50 mg of dabigatran twice a day or enoxaparin) and 
the number of patients having a blood clot are independent. 


The question would be repeated for each of the other dose levels of 
dabigatran that were examined in the trial (150 mg twice a day, 300 mg once 
a day and 225 mg twice a day). 


Solutions to exercises 


Solution to Exercise 1 


(a) 


(b) 


(d) 


Weight loss can be measured on an objective scale — in kilograms per week, 
for example. 


Certain aspects of appetite can be measured only on a subjective scale. 
Researchers might wish to know, for example, whether a drug affected 
people’s subjective impression of their own appetite. On the other hand, it is 
possible to set up objective measures which might accurately reflect these 
subjective impressions: for example, the amount of food eaten (measured by 
weight or calorie content) over a certain period of time. 


This would probably be measured on a subjective scale. People might be 
asked to rate the soreness of their throat on a scale of discomfort, or doctors 
might assess the soreness of the throat on the intensity of the inflammation, 
or the size of the inflamed area. The inflammation could in theory be 
measured objectively, but in practice it would probably not be measured in 
this way. 


This would be measured on a subjective scale. People would probably be 
asked to rate the severity of their discomfort on a numerical scale. 


Solution to Exercise 2 


(a) 


The severity and nature of the depression — and the individual’s history of 
depression — may vary greatly. For example, some individuals might be 
suffering from long-lasting and extremely severe depression, whilst others 
might be suffering from short-lived and moderate depression. The causes, 
and therefore the cure, of the depression might thus differ. A woman 
suffering from post-natal depression might, for example, respond to a drug 
very differently from a woman suffering from a bereavement. A person with 
a long history of depressive illness might respond to a drug very differently 
from a person who is experiencing depression for the first time. 


People who volunteer to take part in a clinical trial might not be typical of 
people in general suffering from depression. For example, severely 
depressed people might not be willing to volunteer at all. Hence the results 
of these trials on volunteers might not be representative of the results that 
would be obtained in the population of depression-sufferers in general. 


The problem outlined in part (a) would have to be overcome by making sure 
that the experimental and control groups were as similar as possible with 
respect to the severity, nature and history of their depression. The problem 
outlined in part (b) would have to be overcome by selecting subjects for the 
clinical trial who were representative of the population of 
depression-sufferers in general, if this is ethically possible. 
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Solution to Exercise 3 


Despite the use of standardised questionnaires, the patients’ responses may be 
markedly affected by their attitude towards the experimenters, and this attitude is 
likely to vary from one patient, and one experimenter, to another. For example, 
the answers that an aggressive experimenter obtained might be different from 
those that would have been obtained from the same patient by a more polite 
experimenter. 


Solution to Exercise 4 


(a) This is an ideal example of a matched-pairs design, but one that is extremely 
difficult to set up, owing to the difficulty of finding enough pairs of identical 
twins suffering from the condition of interest. 


(b) This is a group-comparative design. 


(c) This is a crossover design; each individual crosses over during the course of 
the experiment from one treatment to the other. 


Solution to Exercise 5 


(a) A crossover design would be most appropriate here, provided that there was 
no carry-over effect. 


(b) The only possible option here would be a group-comparative design 
because it would not be ethical to use the extra time required to set up a 
matched-pairs trial. 


(c) A matched-pairs design could be the most appropriate here because it 
might be possible to obtain a lot of volunteers for the trial and to select pairs 
that match for age, sex, how heavily they smoke, etc. If enough matched 
pairs could not be found, then a group-comparative design is best, probably 
using stratified randomisation so that the control group and treatment group 
have similar characteristics for factors thought important. 


Solution to Exercise 6 
(a) The data are categorical. 


(b) The data are tabulated in a suitable contingency table below. 


Experimental group Control group Total 


Conceived 1 15 16 
Not conceived 999 985 1984 
Total 1000 1000 2000 


(c) Null hypothesis (Ho): The effects of the new and old contraceptive on 
conception are the same. 


Alternative hypothesis (H,): The effects of the new and old contraceptive on 
conception are different. 


Expressing these in a form suitable for the 7 test involves looking at the 
experiment in a slightly different way. The drug company was interested in 
determining whether the type of contraceptive used has any effect on the 
number of women who conceived during the one-year period. We should 
therefore set up the null and alternative hypotheses in terms of these 


variables as follows. 


9: The type of contraceptive used and the number of women 
who conceived during the one-year period are independent. 


HA: There is a relationship between the type of contraceptive 
used and the number of women who conceived during 
the one-year period. 
(You may have set up the null and alternative hypotheses using slightly 
different wording. This does not matter as long as you have labelled the 
variables clearly and the hypotheses convey the same meaning.) 
(d) The Expected values are calculated using the method of Unit 8 
(Subsection 4.2), 
row total x column total 
overall total 
which gives the following table of Expected values. 


r= 


Experimental group Control group 


Conceived 8 8 
Not conceived 992 992 


Notice that all the Expected values are greater than 5, so it is appropriate to 
use the y? test. 


The Residuals are obtained from 
Residual = Observed — Expected. 


Thus, for example, the Residual for the first cell is 1 — 8 = —7. The following 
is the Residual table. 


Experimental group Control group 


Conceived —7 € 
Not conceived 7 —7 


For the first cell, the contribution to x? is given by, 
(Observed — Expected)? 
Expected 
(Residual)? 
Expected 

_ (-7)? 
8 
= 6.125. 

Repeating the calculation for all four cells results in this 7 table: 


x? contribution = 


Experimental group Control group 


Conceived 6.125 6.125 
Not conceived 0.0494 0.0494 


where the x? contributions for the ‘Not conceived’ groups have been 
rounded to four decimal places. 


Hence the value of the x” test statistic is 
6.125 + 6.125 + 0.0494 + 0.0494 ~ 12.349. 


The number of degrees of freedom for a 2 x 2 contingency table is 
(2—1)x (2-1) =1. 
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Hence, from Table 3, the critical value at the 5% significance level is 3.841, 
and at the 1% significance level it is 6.635. 


Since the test statistic, 12.348, is greater than the 1% critical value, 6.635, 
we reject Ho in favour of H, at the 1% significance level and conclude that 
there is strong evidence of a relationship between the type of contraceptive 
used and the number of women who conceived in the one-year period. In 
the terminology of Subsection 4.2, the new and old contraceptives have 
significantly different effects on conception. 


Looking at the contingency table in (b), you can see that the new 
contraceptive results in fewer conceptions. This, together with the result of 
the hypothesis test, means that the drug company should be satisfied that 
the new contraceptive was better than the existing one against which they 
tested it. These results would also be useful evidence to submit with an 
application for a product licence, provided that they were accompanied by 
evidence that there were no side effects and that the trials had been carried 
out carefully — for example, that both groups of women were using the 
contraceptives correctly. (Many oral contraceptives are effective only if they 
are taken regularly at the same time every day.) 


Solution to Exercise 7 


(a) 


These data are on an interval scale (of beats per minute). Not only is it 
possible to say that a heart rate of 120 beats per minute is larger than one of 
70 beats per minute, it also makes sense to say that it is 50 beats per minute 
larger. 


Since this was a crossover design, the two measurements for each person 
form a matched pair. (Each pair of measurements consists of the heart rates 
for one person — when given the drug and when given the placebo.) The 
sample size is small, so the matched-pairs t-test is appropriate here. 


The necessary assumption is that the population distribution of the 
differences in heart rate reduction (between the placebo and the drug) follow 
a normal distribution. 


The hypotheses for a matched-pairs t-test are: 
Ao: ta =O and Ay: wg £0, 
where d is the difference between the placebo and drug. 


The two-sided matched-pairs t-test statistic is found using the procedure in 
Subsection 4.2 of Unit 10. The first step is to calculate the differences 
between the pairs of data values, square them, and total these, as in the 
table below. Here we have subtracted the D (drug) values from the P 
(placebo) values. You may have subtracted the P values from the D values, 
in which case your solution will differ slightly from ours. However, you should 
end up with the same conclusion. 


Heart rate (beats per minute) Difference 


Subject Placebo Drug d d? 
1 105 90 15 225 
2 88 82 6 36 
3 90 95 —5 25 
4 89 80 9 81 
5 80 88 —8 64 
6 110 75 35 1225 
7 84 83 1 1 
8 100 90 10 100 
> 746 683 63 1757 


The mean of the differences is 


qo 8 Lae. 
n 8 


Also 
»_ 1 » (dP) _ 1 _ 68? 
cs Sod - = 1757 7 


i 
=z (1757 — 496.125) = 180.125. 


Thus, the standard deviation is s = 180.125 ~ 13.421. Hence, 
s 13.421 

ESE = —= = 
vn VB 

Now, the test statistic is 

d 
ESE 

as the null hypothesis is Ho: wg = 0. Thus 
7 7.875 
~ 4,745 


From Table 4, the critical value at the 5% significance level with n — 1 = 7 
degrees of freedom is 2.365. 


~ 4.745. 


t= 


~ 1.660. 


Since the test statistic, 1.660, is less than the critical value, 2.365, we cannot 
reject Ho at the 5% significance level. In the terminology of Subsection 4.2, 
the drug and placebo are not significantly different in their effect on heart 
rate. Thus, from these data there is little evidence that the new drug alters 
people’s heart rate. 


(However, the sample of people used is very small, so it might still be worth 
conducting a larger trial with more people.) 


Solution to Exercise 8 


The contingency table for the data is as follows: 


Blood clot Total 
Yes No 


Dabigatran220mg 53 #827 ~~ 880 
Enoxaparin 40 mg 60 837 897 


113. 1664 1777 
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The null and alternative hypotheses are: 


Hg: Daily doses of 220 mg dabigatran and 40 mg enoxaparin are 
equally effective at reducing the risk of a blood clot. 


H: Daily doses of 220 mg dabigatran and 40 mg enoxaparin are 
not equally effective at reducing the risk of a blood clot. 


The Expected table is as follows. 


Yes No 


Dabigatran 220mg 55.9595 824.0405 
Enoxaparin 40mg 57.0405 839.9595 


The Expected values are greater than 5, so it is appropriate to use the y? test. 


Residual = Observed — Expected, so the following is the Residual table. 


Yes No 


Dabigatran 220mg —2.9595 2.9595 
Enoxaparin 40 mg 2.9595 —2.9595 


The x? table is: 


Yes No 


Dabigatran 220mg 0.1565 0.0106 
Enoxaparin 40mg 0.1536 0.0104 


The x? statistic is therefore 
5 as = 0.1565 + 0.0106 + 0.1536 + 0.0104 ~ 0.331. 


The number of degrees of freedom for a 2 x 2 contingency table is 

(2—1) x (2—1) = 1. So, from Table 3 (Exercises on Section 4), the critical 
value at the 5% significance level is 3.841. As 0.331 < 3.841 we do not reject 
the null hypothesis at the 5% significance level. There is little evidence that daily 
doses of 220 mg dabigatran or 40 mg enoxaparin differ in their effectiveness at 
reducing the risk of a blood clot. 
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